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The Effectiveness of Intelligence Tests in the 
Selection of Workers 


Edwin E. Ghiselli and Clarence W. Brown 
University of California 


Of the various types of tests that have been used in the selection of 
workers, undoubtedly intelligence tests in their varied forms have been 
most frequently utilized. To some extent this assertion is borne out by 
the number of reports of the effectiveness of intelligence tests in the selec- 
tion of workers which appear in the professional literature. Indeed, the 
authors have been able to locate some 185 instances of reports of the 
effectiveness of these tests in the selection of employees in various occupa- 
tions. These instances include only those investigations where results 
were given in some statistical fashion indicating the degree of relationship 
between the test scores and some index of proficiency on the job. If one 
considers the additional number of investigations dealing with the rela- 
tionship between intelligence test scores and success in industrial training, 
labor turnover, and the like, it is apparent that this number is an under- 
estimate. 

To the above also must be added the unreported, but undoubtedly 
numerous, instances where intelligence tests have been used for employ- 
ment purposes without any verification whatsoever of their effectiveness. 
One finds both in industry and in governmental agencies the indiscrimi- 
nate use of intelligence tests for the selection of workers at nearly every 
occupational level. The apparent reasoning in the blind use of these 
tests is that while they may not prove to be effective in selecting better 
workers for the specific job under investigation, at least they will do no 
harm. Stated another way, in any employment situation the relation- 
ship between intelligence test scores and job proficiency can be expected 
to be either positive or zero but will not in any statistical sense be signifi- 
cantly negative. 

It is obvious that this notion needs empirical verification before it can 
be accepted. If, for example, it were found that the relationship between 
intelligence test scores and measures of job success were invariably either 
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positive or insignificantly different from zero, then the use of these tests 
under such a point of view would receive some justification. On the 
other hand, if significant negative relationships were frequently found 
then the notion could be considered of doubtful application. It would, 
of course, be preferred that all indiscriminate use of tests cease, but this 
state of affairs, if ever attained, will require many years of education in 
scientific test principles. In the meantime we should learn what past 
applications will reveal concerning the value and limitations of any test. 


Basic Data and Methods 


To examine the hypothesis that intelligence tests will give either posi- 
tive or zero predictions of occupational success, but will not give negative 
predictions, the authors surveyed the various professional journals and 
books for instances wherein intelligence test scores had been checked 
against job proficiency measures. In addition, they obtained instances 
from unpublished data collected by several industrial organizations. 

The correlations sought and employed in this analysis were between 
test scores and some index of proficiency on the job. The criteria against 
which the tests were validated included ratings of proficiency by super- 
visors, actual production figures, or some similar measure indicative of 
job proficiency. Relationships between test scores and success in train- 
ing, such as apprentice training, were not included. Similarly, valida- 
tions using as criteria the number of inservice promotions or raises were 
not considered. Job tenure as a criterion was accepted for inclusion only 
when the employees in the separated group were released because of 
failure to perform adequately on the job. 

In all, 185 validity coefficients were collected. These coefficients 
were segregated according to the occupational groups of the workers from 
which they were obtained. To test the statistical significance of the 
departure of these validity coefficients from zero, each one was transmuted 
into an equivalent Fisher’s z’ and then divided by the standard error of 
a z’ of zero value for the same number of cases. The approximate formula , 
for the standard error of z’, the reciprocal of the square root of N-3, 
was used. 
tests for various occupational groups together with the number of coeffi- 
cients on which the median was computed. It is clear that for certain 
of these occupational groups, namely, clerical workers, sales clerks, semi- 
skilled workers, and unskilled workers, the data are sufficient to draw 
rather definite conclusions. For clerical workers the relationship be- 
tween test scores and job proficiency is only moderately high, being .35. 
In most cases a validity coefficient of this order would be considered as 
minimally acceptable for a single test.’ For sales clerks and semiskilled 
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Results 


In Table 1 are presented the median validity coefficients of intelligence 
and unskilled workers the median coefficients are too low to indicate any 
general usefulness of intelligence tests in these occupations. With sales 
clerks there is even a strong indication of a negative relation. In the 
case of supervisors and skilled workers there is a suggestion that intelli- 
gence tests might be helpful in selecting workers in these groups. 


Table 1 


Median Validity Coefficients of Intelligence Tests for Various Occupational 
Groups in the Prediction of Job Proficiency 


Median Number of 

Validity Validity 
Occupational Group Coefficient Coefficients 
Clerical workers 35 85 
Supervisors .40 9 
Salesmen 4 
Sales clerks —.09 18 
Protective service 25 6 
Skilled workers .55 6 
Semiskilled workers .20 45 
Unskilled workers .08 13 


However indicative these median validity coefficients might be, they 
are at best only a rough approximation to the predictive effectiveness 
that intelligence tests might have in any particular employment situation. 
Furthermore, the findings from different reported studies with respect to 
the effectiveness of a particular type of test for the selection of workers 
for a given job are only infrequently in close agreement. Rather, the 
validity coefficients reported by different research workers for the same 
type of test and job are likely to vary over a considerable range of values. 
Figure 1 presents a graphic picture of the distribution of validity coeffi- 
cients for intelligence tests fu. che different occupational groups. The 
marked variation in findings reported by different investigators is fully 
revealed by this figure. To some extent this variation can be attributed 
to differences in the types of intelligence tests used, to differences in the 
reliability of the criteria employed in validation, to differences in the 
homogeneity of occupational groups, and to sampling error. However, it 
seems likely that much of the variation in effectiveness of the intelligence 
tests must be ascribed to differences in the demands and requirements 
set for a job which in different organizations would appear to be similar in 
name only. In other words, it would appear that the job and worker 
specifications for a particular job vary to a marked extent from one estab- 
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lishment to another. Thus, the fact that intelligence tests were found 
to be effective in selecting workers for a particular job in one or more 
establishments would give no assurance that they would be equally useful 
for a job with the same name in other establishments. There is posed 


CLERICAL WORKERS 


-.50 -.25 .00 +.25 +.50 +75 
SUPERVISORS 


-.50 -25 +.25 +50 +.75 
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-.50 -.25 .00 +.25 +50 +75 
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2 
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Number of Volidity Coefficients 
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Fic. 1. Distributions of validity coefficients of intelligence tests for 
different occupational groups. 
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here the problem of getting more adequate job and worker analyses 
and specifications in the framing and describing of the validating criteria. 

In Figure 1 it will be seen for clerical workers, supervisors, and un- 
skilled workers the validity coefficients range from about zero to as high 
as .60 to .70. In the case of semiskilled workers, while some of the 
coefficients reach this top limit, a few are negative and reach as low as 
—.35. With skilled workers the coefficients are generally high, and for 
salesmen and protective service workers the coefficients are fairly homo- 
geneous and are of the order of .30. The one striking difference appears 
in the case of sales clerks. With this group thirteen of the eighteen 
validity coefficients are negative and range as low as —.60, the highest 
positive coefficient being .25. 


Table 2 
Significance of Validity Coefficients of Intelligence Tests for Various 

Occupational Groups 

Number of Validity Coefficients 
Positive and Not Negative and 
Significant at av Significant at 
ifferent 

Occupational 1% 5% from 5% 1% 

Group Level Level Zero Level Level Total 
Clerical workers 48 ll 25 84 
Supervisors 5 2 2 9 
Salesmen 3 1 4 
Sales clerks 1 ll 1 5 18 
Protective service 2 4 6 
Skilled workers 5 1 6 
Semiskilled workers 20 1 23 1 45 
Unskilled workers 2 2 9 13 
All workers 83 21 74 2 5 185 


A measure of the statistical significance of the validity coefficients of 
the intelligence tests for the various occupational groups is given in 
Table 2. It will be noted from this table that for clerical workers, super- 
visors, salesmen, and skilled workers the majority of the coefficients are 
significantly different from zero in the direction of a positive relation- 
ship. For protective service workers and unskilled workers the majority 
of the coefficients are not significantly different from zero. In the case 
of semiskilled occupations only one of the 45 validity coefficients ap- 
proaches significance in the direction of a negative relationship while the 
remainder are about equally divided between a significant positive rela- 
tionship and insignificance from zero. For sales clerks a third of the 
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18 coefficients are significant in the negative direction, five of these being 
significant at the 1% level. Only one coefficient is significant in a positive 
direction. 


Summary and Conclusions 


An overall picture is presented of the validity of intelligence tests for 
the selection of workers in eight occupational groups. Despite wide 
variations in the kinds of tests used, the excellence of the test administra- 
tions, the adequacy of the criteria, and similar factors, there still appear 
certain tendencies in the data analyzed which may prove useful in future 
utilization of intelligence tests in selection. Keeping the foregoing limita- 
tions in mind, together with the number of validity coefficients analyzed, 
the following statements seem justified: 


1. For clerical workers intelligence tests are very useful selection in- 
struments. 

2. For supervisors, salesmen and skilled workers, intelligence tests 
show high promise of being very useful instruments but further knowledge 
of their effectiveness is needed. 

3. For sales clerks and unskilled workers intelligence tests are of 
little service in selection. 

4. For semiskilled workers intelligence tests may prove of some value 
when used in combination with other tests but as single instruments they 
show little promise. 

5. For workers in the protective service intelligence tests may prove 
useful when combined with other tests but at present there is need of 
further data before a definite conclusion can be made. 


Received March 29, 1948. 
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Evaluation of a Clerical Applicant Testing Program * 


| William James Giese 
William James Giese, Ph.D., and Associates, Chicago 3, Illinois 


and 


Frances Weigle 
David C. Cook Publishing Company, Elgin, Illinois 


All employment testing programs should be critically and systemati- 
cally reviewed to learn if the tests are measuring the capacities, profi- 
ciencies, etc. which are significantly related to job success. Such a study 
will point out any needs for changes in the program or determine whether 
or not the program is worth continuing. 

At the David C. Cook Publishing Company the applicant load is 
relatively low. Employment tests, however, have been used as general 
interviewing aids. When sufficient data become available, standards or 
“local norms” in terms of test scores can be set up if there is a high rela- 
tionship between the test scores and employee desirability. Such in- 
formation will be especially useful when the number of applicants becomes 
more plentiful. 

Employment tests have been used by Cook’s for about 8 years. The 
company retained the services of a consulting firm for the purpose of 
installing psychological tests for the selection of clerical personnel in the 
early part of 1940. The Clerical Test D was given to most clerical 
employees and the test scores related to merit ratings. However, these 
early data on present personnel are not available. 

Since 1940, the Clerical Test D and the StenoGaugE have been given 
to nearly all of the applicants for clerical or stenographic positions. Other 
tests have been given to many of the employees, but ouly the two tests 
mentioned had sufficient data to permit an evaluation of their usefulness 
as selection and placement aids. 

These tests were usually administered by the Personnel Manager frora 
1940 to November 1943, and since November 1943 they have been ad- 
ministered by a personnel assistant who has an A.B. in Psychology or by a 
qualified consultant. 


*The authors wish to express their appreciation to the Davia C. Cook Publishing 
Company for their interest and cooperation in making this article possible. 
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Method 


To find out how useful these tests had been in selecting desirable 
personnel the authors investigated the relationship between scores made 
on the StenoGaugE and the Clerical Test D at the time of employment 
and subsequent experience with the people as employees. 

They considered as possible criteria objective records of performance 
such as length of service, production, absenteeism, accidents, errors, and 
similar records. They also considered as criteria, systematic but non- 
objective records such as: merit ratings, willingness to rehire at time of 
termination, estimates of promotability and similar systematic but non- 
objective records. 

In deciding which of the available material just mentioned would be 
practical to use, the following standards were used: meaning in terms of 
final results, consistency and probable accuracy of the records, number 
of employees involved, and accessibility of the data. 


Results 


From the data which were practical to use, the authors found the 
StenoGaugE to be helpful in measuring typing and spelling proficiency 
since the test relates positively to both supervisors’ ratings and super- 
visors’ willingness to rehire. 

The correlation between scores on the StenoGaugE and supervisors’ 
ratings is .61+ .10. Figure 1 illustrates how well the StenoGaugE 


Rated in 
Upper 56% 
N=43 
a r xy .10 
Rated in 
lower 44% 80% 
by Supers In Upper 58% 
on 
2% StenoGaugE 
20% 
in Lower 42% 
on 
StenoGaugE 
78% 


Fic. 1. How the applicants’ scores on the StenoGaugE at the time of 
employment relate to supervisors’ rating after,three months of service. 
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identifies those applicants at the time of employment who will be rated 
high and those who will be rated low by their supervisors after 3 months 
on the job. : 
Figure 2 shows that there is a positive relationship between the super- 
visors’ willingness to rehire employees who have terminated and the 
employee’s score on the StenoGaugE at the time of employment. Figure 
2 also shows a positive relationship between remaining with the company 


Scores of Employees who Hove Terminated who are Eligible for Rehire 


w= 
82288 8 8 
OO CL) 
(N = 18) 


Scores of Employees who Hove Terminated whe ore not Eligible for Rehire 


Fic. 2. Relationship between scores on the StenoGaugE and 
turnover (January 1940 to August 1947). 


and the score made on the StenoGaugE at the time of employment. Of 
the people who have left the company, those with higher scores on the 
StenoGaugE tend to work longer before terminating although r between 
length of service before termination and scores on the StenoGaugE is 
only .18}+ .11 (SE for an r or .0). 


| 
‘ Scores of Employees whe ore with the Company es of August 1967 
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From these relationships it was concluded that the StenoGaugE is 
doing a reasonably good job of measuring proficiencies which are crucial 
to job success in the typing positions. 

From the data which were practical to use, the authors found the re- 
lationship between Clerical Test D and supervisors’ ratings of clerical 
employees to be not nearly as high as was the relationship between the 
StenoGaugE and the typists’ ratings by their supervisors. The Pearson 
Product-Moment Correlation is .39 + .10 between supervisors’ ratings 
and test scores at the time of employment. Figure 3 illustrates, in 
graphic form, this relationship. 


Rated in 
Rated in Upper 51% 
lower 49% Supers. 
by Supers. 
in Upper 49% 
on 
41% Clerical Test D. 49% 
In Lower 51% 
on 
Clerical Test D. 
59% 51% 
N =72 
r xy =.39 4.10 


Fic. 3. How the applicants’ scores on the Clerical Test D at the time of 
employment relate to supervisors’ rating after three months of service. 


No relationship was found between length of service of office workers 
and score on Clerical Test D. The correlation is .0. 

With regards to turnover there is a low relationship between the 
supervisors’ willingness to rehire a terminated employee and the em- 
ployee’s Clerical Test D score at the time of employment. Figure 4 
illustrates this relationship. 

From these relationships it was concluded that Clerical Test D is doing 
a poor job of measuring those capacities which are significant to job success 
in the general office. 


Summary and Recommendations 


On the basis of these findings the following recommendations were 
made to the David C. Cook Publishing Company: 
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1. The StenoGaugE should be continued to be used as an employment 
test for the typing jobs requiring as high a test score as the selection ratio 
will permit. A revision of the scoring system should be considered which 
will give the test more differentiation at the higher levels. 


Scores of Employees whe Have Terminated who ore Eligible for Rehire 


(N = 78) 
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Scores of Employees who ore with the Compony os of August 1947 


Fie. 4. Relationship between scores on the Clerical Test D and 
turnover (January 1940 to August 1947). 


2. The Clerical Test D should be dropped. An analysis of the 
various jobs for which it was being used as a predictor revealed they were 
not general clerical jobs, but were jobs which were more likely to be 
primarily filing, comparing numbers, etc., or jobs which were primarily 
computational. Apparently, the test does not measure reliably or validly 
this type of clerical ability. Furthermore, the scoring is rather involved 
and subject to error. 

3. For the more strictly clerical jobs, an intelligence test and a clerical 
aptitude test are recommended. 
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4. For those office jobs which are primarily computational in nature, 
an intelligence test and an arithmetical proficiency test are recommended. 

5. Since employment testing is established and accepted it is recom- 
mended that it be expanded to all applicants at the hourly rated and 
nonexempt salary levels for those jobs which demand capacities that are 
practical to measure in the employment office. The increase in cost 
would be negligible and pertinent test information might direct and 
shorten interviewing time. 
Received May 15, 1948. 


Use of the “Group Situation Observation” Method in the 
Selection of Trainee Executives 


Ronald Taft 
The Institute of Industrial Management, Melbourne, Australia 


A recurrent problem in the planned programs for the selection and 
‘training of young executives is that of predicting the likely future de- 
velopment of the potential trainee while he is still a youth. This article 
describes the application of the group situation observation technique to 
this problem of selection. This technique was originally used in the 
German Army Selection procedures, and adopted (and adapted) by the 
British (3) and Australian Armies (4). The U. 8. Army O. S. S. also 
utilized the basic principles in connection with the selection of personnel 
for operations behind enemy lines (6). Since the conclusion of the War, 
it has been applied to the selection of trainee industrial foremen, managers 
and civil service administrators, mainly in Britain (1,2). The present 
report deals with the application of the technique to a group whose age is 
well below that of other reported uses (17 to 19 years). 

The position for which the candidates were being considered was that 
of trainee production executive, in a shoe factory with 200 employees. 
Two trainees were required. Because of the long-range nature of the 
training program, no exact definition of the traits required by these 
trainees was attempted, but the selectors were familiar with the factory 
and the approximate duties which would be required of the future 
executives. 


Procedure 


The screening procedure prior to the group observation sessions is 
given briefly to provide a background to the data available to the selectors. 

Written applications from 63 persons were received as a result of 
newspaper advertisements, and 13 of these were rejected without inter- 
view on educational grounds. The Managing Director of the Company 
then gave an orientation and screening interview to the remaining 
candidates, as a result of which 11 were rejected as “‘unsuitable types” 
and 5 withdrew their applications. Eleven failed to report for this 
interview. 

The remaining 23 applicants were then given a vocational guidance 
interview by the writer, at which time they were given the following tests: 
Vocational Interest Questionnaire; Personal Questionnaire “L” (Hana- 
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walt and Richardson); Oral and Written Directions (Adaptation of 
Army Alpha); H Test (short form) (Adaptation of Army Alpha); Speed 
and Accuracy (Minnesota Vocational Test for Clerical Workers); Space 
Form Perception (Australian Institute of Industrial Psychology); and 
Mechanical Comprehension (Bennett A A). This was followed by a 
half-hour interview. Two failed, however, to report for this interview. 

Seven more applicants were rejected at this stage on grounds of 
interest, temperament or ability, including all those with a score of less 
than the 60th centile on general population norms for the H test. 


Group Situation Examination 


The remaining 14 candidates were invited by mail to be present at the 
home of the Managing Director ‘‘to spend the day with him in connection 
with your application for employment.” One failed to attend. The 
others were divided into three separate groups, of four or five, each group 
being arranged for either a Saturday or a Sunday. During the group 
situation they were under the observation of the Managing Director and 
the writer (henceforth referred to as the Psychologist), the latter control- 
ling the day’s proceedings. 

The following program was observed. 


Step Time Period Activity 
Zz 11.45 to 12 Introduction. 
2. 12 to 12.15 Personal History. 
3. 12.15 to 12.45 Game—“Who am I?” 
4. 12.45 to 1.45 Lunch. 
5. 1.45 to 3.30 Group Rorschach Test. 
6. 3.30 to 4.15 Leaderless Discussion. 
4.15 to 4.30 Afternoon Tea. 
8. 4.30 to 5 Problem Situation Discussion. 
9. 5 to 5.30 Personality Judgments of Self and other Candidates. 
10. 5.30 Closing Address by the Managing Director. 


1. Introduction. Candidates were welcomed and introduced to each 
other by the Managing Director, and a brief word on the procedure was 
given by the Psychologist. They were asked to try to adopt an informal 
attitude and to refer to each other by their first names. They were 
warned that it is “impossible to beat the system,” so that it would be in 
their best interests to try to be natural right from ro beginning rather 
than to bluff their way through. 

2. Personal History. Each candidate was asked in turn to “introduce 
yourself to the others by stating briefly your personal history.” No 
further instruction was given, and they were called on in order of age, 
starting from the oldest. At the conclusion of these short outlines, the 
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candidates were given an opportunity of asking questions about the 
others, but a total of only four questions was asked. 

This procedure was of some value in giving the candidates a brief 
outline of their colleagues’ background; and also in indicating which 
factors the candidates considered significant in their lives. However, 
there was a tendency to adopt the pattern followed by the first speaker, 
and it was necessary in evaluating the contributions to consider this 
factor. Thus credit was given to a fourth speaker who broke away from 
an unsatisfactory habit adopted by the prior speakers of speaking about 
their schools rather than themselves.! 

Indications about the candidates obtained from this procedure mainly 
related to self-confidence, particularly while in a situation calculated to 
unsettle them; also their ability to select salient factors, and to follow 
an independent line. 

3. Game—“Who am I?” Candidates were then informed that they 
were to play a game commonly known as ““Who am I?,”’ or “‘Personalities.”’ 
They were instructed as follows: “One person is to leave the room, and 
the others are to imagine that they represent a well-known personality, 
either living or dead. The person leaving the room should be brought 
back and should endeavour to find out who the personality is, by asking 
each one of the others in turn a question the answer to which is either 
‘Yes’ or ‘No.’ You should keep on asking questions until you have 
narrowed down the field, and you are allowed only one guess. I will not 
give you any further instructions, and you should work out any other 
details yourselves. Continue with this game until each one of you has 
had a turn.” Whenever any questions are asked they were reminded 
that they were ‘“‘on their own.” 

This test appeared to be particularly useful as a means of introducing 
the group to the leaderless group situation, as on each occasion problems 
regarding the observance of the rules were raised. Information obtained 
from this session related to the ability of the candidates to get their 
opinions accepted, attitudes towards the observance of rules, flexibility, 
intelligence, concentration, reaction to frustration, impulsiveness (tend- 
ency to guess rather than analyse), persistence, sympathy with the 
difficulties met by others, extent of general knowledge, identification with 
famous people, and so on. For example, when the questioner made an 
incorrect guess, it was useful to observe how the others responded to the 
rule that only one guess should be permitted. 

4. Lunch. During lunch the Managing Director and the Psychologist 
endeavoured to take part in the conversation and to make the atmosphere 

1 In referring to information obtained as a result of the various tests, the writer has in 


mind the notes made by the observers at the time, but no attempt was made to infer 
particular characteristics from the one test only. 
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informal. Lunch commenced as a standing buffet to permit candidates 
who were attracted to each other to come together, and a “mental” note 
was made of their social and individual behaviour. 

5. Group Rorschach. This was not properly a group situation test, 
but was introduced at this stage of the selection procedure for convenience 
only. Also it was felt that doing this; test would help to break down 
tension, by strengthening the feeling on the part of all the candidates that 
they were going through the same trial together. 

The use made of the Rorschach interpretations was similar to the use 
made of the aptitude tests; that is, it was primarily a screening device, 
intended to cull out those with definite neurotic symptoms. In this 
respect one was rejected as too uncontrolled and one as too inhibited, the 
latter giving only eight responses. 

6. Leaderless Discussion. The candidates were seated in a circle, with 
the Managing Director and Psychologist at the side. They were told to 
regard the latter as “‘merely pieces of furniture,” and that they were now 
to discuss any topic on which they might decide. No further instructions 
were given. 

The group dynamics involved in the selection of the topic itself provided 
valuable material. This test was also useful for observing how the 
subject stands up to argument, whether he perseveres or shows resistance 
to persuasion and whether he becomes emotional. In two of the three 
groups a dominant person seemed to arise at this juncture, and an op- 
portunity was afforded for observing whether the form of domination was 
“autocratic” or “integrative” (in the sense used by Lewin). 

7. Problem Situation Discussion. This discussion differed from the 
previous one only in so far as it was more structured, that is, the group 
was given an actual assignment. The candidates were given the facts 
about the hours of work at the factory at which they had applied for the 
position, and were asked to report their recommendations back to the 
Managing Director on how they considered these hours should be altered 
to arrange a 40-hour week. (The factory was previously working a 
44-hour week.) 

This discussion again gave scope for observing tendencies in certain 
of the candidates to dominate their group. It also was revealing about 
the knowledge of the candidates as to the general situation in industry, 
and their attitude towards management and employees (this was con- 
sidered in conjunction with their previous experience and home back- 
ground). 

The main difference between the leaderless discussion and the problem 
situation discussion is that the former gives more scope for the individual 
to show his personality and ability qua individual, while the latter stresses 
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rather the individual as a member of a group, the members of which are 
all motivated towards the same end, that is finding the solution to the 
problem. 

8. Personality Judgments. The candidates were then instructed as 
follows: “It is an important part of the duties of a factory manager to be 
able to sum up other people and himself objectively, and if necessary, 
ruthlessly. . You should now write a thumb-nail sketch of the other 
candidates and yourself, with particular regard to their personalities with 
respect to the position of trainee factory executive. All of your reports 
will be anonymous and will not sway our judgment either against or in 
favour of any particular candidate.” They were seated at separate 
tables for this task in order to reduce any inhibitions that may have arisen 
from the close proximity of the persons being rated. 

The judgments made varied considerably in quality, and revealed 
varying willingness to unmask personalities. The insight possessed by 
the candidates also appeared to vary considerably. 

9. Report on the Prodeedings. In his closing address the Managing 
Director requested the candidates to forward to him by mail a report re- 
counting the proceedings of the day, and giving their impressions of what 
had occurred. These reports provided an indication of each candidate’s 
judgment, ability to write a report on factual occurrences, powers of 
observation, memory for details, and maturity in evaluating a situation. 


Evaluation of the Candidates 


At the conclusion of each day’s observations, the Managing Director 
and the Psychologist discussed and tentatively evaluated the candidates 
in terms of their suitability for the position in question. Following the 
practice used in the evaluation of O. 8. 8. candidates (5), they were not 
judged on their comparative levels on a number of traits, but they were 
discussed rather in terms of their weak and strong points as shown in the 
various situational tests conducted during the day. 

When the Rorschach tests had been scored and the reports received 
from the candidates a final selection conference was held. All the avail- 
able information and reports on the candidates were considered, with 
particular weight given to the group observation data, since the other 
data had already been used for screening purposes. 


Evaluation of the Procedure 


A consideration of the validity of the group observation procedure 
involves two major questions: (a) How well does it predict the ultimate 
success of the candidates? and (b) Does it add anything to the predictive 
power of the usual test battery plus interview? 
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It would be difficult to answer either of these questions in the absence 
of criteria provided by long-range longitudinal studies. However, in 
respect to question (b) it may be of value to compare the rankings made 
by the Psychologist after the vocational guidance interview with the 
overall rankings made at the completion of the selection procedure. 
These are set out in Table 1. 


Table 1 


Showing Comparative Rankings of Candidates by the Psychologist 
After the Vocational Guidance Interview and the Overall Ranking 
After the Completion of the Selection Procedure 


Candidates A and B—the selected candidates—would have been 
chosen as the first two choices without the group observation interview. 
However, there are significant changes in the position of the other 
candidates, and it is possible that such changes could have occurred in 
the case of candidates A and B. 

As far as the individual items of the group observation sessions are 
concerned it is difficult to evaluate their separate contributions to the 
final result, as the day’s proceedings have been viewed as a unit which 
develops progressively. 


Criticisms 
1. The group situation used in selection is so variant from the actual 


situation as to be worthless as a basis for drawing inferences, if not 
actually misleading. 


Rank Order 
after 
Voc. Overall Change after 
Guid. Rank Group 
Candidate Interv. Order Observ. 
2 1 +1 
1 2 -1 
5 3 +2 
3 4 -1 
10 5 +5 
7.5 6 +1.5 
7 +2 
4 8 -—4 
ll 9 —2 
7.5 10 —2.5 
12 ll +1 
13 12 +1 
6 13 
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It is pointed out however that there is sufficient correspondence 
between the “artificial” and the “actual” situations to expect similar 
samples of behaviour. For example, it would be expected that a candi- 
date whose logic deteriorated as a result of emotional involvement in the 
leaderless discussion would show similar reactions in the everyday rela- 
tionships with other factory executives. 

2. Inferences are not permissible from the ability of a candidate to 
lead the other candidates to his ability to lead a group of factory workers. 

This criticism is unavoidable in any form of selection excepting that of 
trial and error, and it is believed that the differences between the two 
social groups were constantly borne in mind by the observers. 

3. The group observation technique assumes consistency of behaviour 
from one situation to another (i.e. test-retest reliability) without regard to 
temporary moods or reactions to unusual circumstances. 

However, if the candidate shows up badly during the observation, it 
seems a reasonable assumption that there will be occasions on the job 
when he will do likewise. 


Viewpoints on the Procedure 


The Managing Director felt that the group observation procedure had 
given him an opportunity to participate fully in the selection procedure, 
and to obtain a preview of his potential employees’ behaviour. It had 
also eliminated much of the esoteric aura that has surrounded the work of 
the psychologist as seen by the layman. 

The reports submitted by the candidates showed that they too con- 
sidered the procedure a particularly just one, eight of the fourteen stating 
this explicitly. Several of them also revealed in their remarks signs of 
the self-clarification which has been noted by other writers on this subject. 
‘This “self-clarification’” can be compared to the insight which develops 
as a result of participation in role-playing.) One typical remark was “I 
shall always remember today as a day of enlightenment and experience 
in my life.” 


Summary 


The problem of developing future executives for industry frequently 
requires a planned program involving the selection of potential executives 
from amongst comparatively young and untried persons. The usual 
methods of psychologically testing and interviewing candidates are limited 
by the difficulty of inferring social behaviour traits (such as dominance, 
cooperativeness, ability to persuade, stability in the face of emotional 
stress, sound judgment, etc.). 
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During World War II the Group Situation Observation Method was 
devised, mainly by the British Army psychologists, to meet this difficulty 
in the selection of officers, and the method is now being applied to the 
selection of industrial and administrative executives. An application of 
this technique has been described where the problem was to select two 
trainee factory executives for a small shoe factory. The candidates were 
first screened by means of aptitude tests and an interview and were then 
divided into groups of four or five for observation. The full day’s 
procedure included a personal introduction by each candidate, a group 
Rorschach Test, an unstructured and a structured discussion period, and 
personality ratings by each candidate of the others. 


Received June 4, 1948. 
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Cautions Concerning the Use of the Taylor-Russell 
Tables in Employee Selection 


Max Smith 
The City College of New York 


The importance of the selection ratio in determining the practical 
effectiveness of tests in selection is pointed out in an article by Taylor 
and Russell (2). In the article the authors also present tables for esti- 
mating the degree of such effectiveness when the validity scattergram rep- 
resents a normal bivariate surface. With this type of correlation distri- 
bution, when the use of a small selection ratio is feasible, even tests of low 
validity may be highly effective in selecting better employees. The con- 
clusion, however, does not apply to every type of validity scattergram. 
Overlooking this fact may lead to unjustified reliance upon a small selec- 
tion ratio as a possible substitute for high validity in obtaining better 
employees. 

It is the purpose of this article to present three considerations which 
may help to avoid unwarranted reliance upon low validity coefficients or 
misuse of the Taylor-Russell tables: First, where a triangular scatter- 
gram is found between test scores and criterion scores, there may fre- 
quently exist a definite limit beyond which further reduction in the 
selection ratio will have no additional value. Second, regardless of the 
foregoing likelihood, the specific probabilities indicated in the Taylor- 
Russell tables are quite inapplicable to triangular scattergrams. Third, 
even in the case where an approximately normal, elliptical scattergram 
does exist, the Taylor-Russell tables as ordinarily used will often yield 
an inaccurate estimate of the prospective gain in effectiveness. 


Limited Value of the Selection Ratio 
with Triangular Scattergrams 


Concerning the value of a low selection ratio, one widely used in- 
dustrial psychology text says: 


“. . . in group testing, a reduction of the selection ratio is a substitute for 
high validity. This statement . . . mean(s) that if the test has any significant 
validity, however small, it is possible for the employer to get the same func- 
tional value from it that he could get from a test of any validity, however 
high, if he is able sufficiently to reduce the selection ratio” (3, p. 69). 


The foregoing statement is theoretically true when the test scores 
and the criterion scores yield a normal bivariate distribution. But often 
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in vocational prediction the correlation surface is definitely not normal. 
The scattergrams found are frequently triangular in shape, not elliptical 
(3, p. 80). Nevertheless, the author goes on to say: 


“A question may be raised as to whether the reasoning on pages 66-75 
in which the significance of the selection ratio concept was discussed, is valid 
when the scattergram is triangular rather than oval in shape. This reasoning 
applies equally well regardless of the shape of the scattergram as long as a 
vertical line drawn through the distribution at any point will divide the indi- 
viduals plotted so as to give a higher average criterion score to those on the 
right of the line than to those on the left. This situation is certain to occur 
when a positive correlation between test scores and criterion exists. Thus, the 
rather common existence of a triangular scattergram does not invalidate the impor- 
tance of the selection ratio” (Italics added) (3, p. 80). 


On the contrary, this article will attempt to show that “the rather 
common existence of a triangular scattergram’”’ does in certain circum- 
stances “invalidate the importance of the selection ratio,’ and that in 
general the Taylor-Russell tables are inapplicable to such distributions. 

The mere fact that a positive correlation between test scores and 
criterion exists is no guarantee whatsoever that a vertical line drawn 
through the distribution at any given point will divide the individuals 
plotted so as to give a higher average criterion score to those at the right 
of the line. Often the triangularity of a scattergram indicates a fairly 
high predictive value for low test scores, accompanied by negligible pre- 
dictive value for higher test scores. In such situations, reducing the 
selection ratio below 100 per cent will at first be more effective than the 
Taylor-Russell tables would lead us to expect. But after a certain selec- 
tion ratio has been reached, testing more individuals in order to reject a 
larger proportion of them will have very little practical effect. Further 
practical effectiveness in reducing the proportion of unsatisfactory 
workers will now require predictive measures of increased validity. 

The relation between visual acuity and production among gaugers 
furnishes an excellent example of such a situation: 


“Figure 59 shows . . . the type of relation between near acuity and pro- 
duction rating for 177 gaugers. The average rating increases with higher 
acuity scores, up to 7. Above that point the average rating does not vary 
systematically. The production averages of all groups scoring 7 or etter in 
near acuity are higher than are those of any groups scoring below 7. On this 
test no minimum requirement higher than 7 would improve the discrimination 
value of the test. It is as though acuity of 7 is adequate to do this job, and 
better acuity is not needed” (3, p. 204). 


The application of the foregoing facts to the question of the selection 
ratio seems quite clear. Testing the near acuity of enough applicants 
so that none with a score below 7 need be hired as a gauger would increase 
the effectiveness of selection. (Alternatively, perhaps, corrective lenses 
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might be prescribed for those with lower scores.) Reducing the selection 
ratio beyond that point would have no additional value. Further refine- 
ment of selection would depend upon finding measures which would show 
some correlation with rated production among applicants whose visual 
acuity score was 7 or higher. 

In the above example the application of the Taylor-Russell tables 
would obviously be unwarranted. If, however, a personnel executive 
had been misled into believing that a triangular scattergram ‘‘does not 
invalidate the importance of the selection ratio,” he might wrongly place 
reliance on the tabled expectancies instead of searching for more valid 
measures. To the extent that such reliance led to complacent satisfaction 
with inadequate validity coefficients, it might perhaps be the very factor 
that prevented the possibility of real improvement in selection. 


Inapplicability of the Probability Estimates 
to Triangular Scattergrams 


Even when a triangular scattergram does represent a definitely in- 
creasing average criterion score accompanying increasing average test 
scores, the specific figures in the Taylor-Russell tables are not applicable. 
’ As Taylor and Russell (2, p. 571) explain, their tables are based on 
Pearson’s ““Tables for Finding the Volumes of the Normal Bivariate Sur- 
face” and consequently assume a normal bivariate distribution. To the 
extent that a given distribution departs from this assumption the specific 
probabilities in the Taylor-Russell tables do not hold. This point is 
obvious and need only be affirmed to be recognized. For students, 
though, it should be made quite explicit in order to insure its not being 
overlooked. 

Statements such as “This reasoning applies equally well regardless of 
the shape of the scattergram .. .” and “. . . the rather common ex- 
istence of a triangular scattergram does not invalidate the importance 
of the selection ratio,” however, not only fail to make the point; they 
might easily be interpreted by the average student as implying that the 
Taylor-Russell tables are applicable to triangular scattergrams. 

The use of the product-moment coefficient of correlation with tri- 
angular scattergrams is ordinarily inadvisable in the first place, since the 
assumption of rectilinearity is generally not fulfilled. To enter the 
Taylor-Russell tables with such a coefficient and to make practical de- 
cisions concerning selection ratios on the basis of the figures one finds is 
likely to lead only to misdirected effort and to ultimate disappointment. 

In many practical examples throughout his Industrial Psychology, 
Tiffin does illustrate sensible, realistic procedures, which students can 
profitably imitate. Despite his statement that “the rather common 
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existence of a triangular scattergram does not invalidate the importance 
of the selection ratio,” he does not in practice apply the Taylor-Russell 
tables to such distributions. Nevertheless, in view of his explanation 
of the use of these tables without cautioning concerning their possible 
misuse, it is not unlikely that some of his student readers may be misled. 


Use of the Tables with Elliptical Scattergrams 


When the test scores and the criterion scores yield an approximately 
normal bi-variate distribution (which will lead to an elliptical scatter- 
gram), the use of the Taylor-Russell tables is justified. But in using 
the tables, we must remember that they assume the applicant group and 
the present employee group to be similarly constituted (2, p. 576). This 
is equivalent to assuming that at present one hundred per cent of appli- 
cants are being hired and retained or that our current selection procedures 
have zero validity or that both conditions prevail. 

The first assumption is almost certainly unwarranted asarule. What 
firm hires and keeps on the job every individual who applies? Nor is it 
always to be expected that the current selection procedures are completely 
worthless, though sometimes they may be. 

A more likely situation is that at present some applicants are being 
rejected for one reason or another and that the current selection procedure 
has some degree of validity in reducing the number of unsatisfactory 
employees. The number of applicants accepted and the number re- 
jected will probably be on record; so the present selection ratio is known. 
The validity coefficient of the current selection procedures may not be 
known, but it is nevertheless operant in reducing the proportion of un- 
satisfactory employees as compared with the proportion that would exist 
if no selection whatsoever were made among applicants. 

Let us use a hypothetical example to see by how much the prospective 
effectiveness of selection might be overestimated if the Taylor-Russell 
tables were used without correcting for the assumptions just indicated. 

We shall assume that at present 73 per cent of our employees are 
satisfactory according to our criterion. What prcportion of new em- 
ployees would be satisfactory if we were to use selection procedures that 
yield a validity coefficient of .30 with our criterion and if we were able to 
limit our selection to the best 40 per cent of applicants according to our 
test standards? 

Using the tables without concerning ourselves about current validities 
or selection ratios, we conclude that we may expect a decrease in the pro- 
portion of unsatisfactory workers from .27 to .18, which means a reduc- 
tion of one-third in the number of unsatisfactory employees. 
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If, however, we hypothesize also that in the total applicant group the 
actual validity coefficient of current selection procedures with our criterion 
is .20 and that our present selection ratio is .60, we find that we may 
expect a decrease in the proportion of unsatisfactory workers from .27 to 
.22, not to .18 as we had inferred from an uncritical use of the tables. In 
other words, we may expect a reduction of 19 per cent (5/27) in the 
number of unsatisfactory employees, not a reduction of one-third. 

The procedure for using the Taylor-Russell tables with due allowance 
for a known pre-existing validity coefficient and selection ratio is ex- 
plained in the following paragraphs set in smaller type. When, as is 
often the case, we don’t know the validity coefficient of our current pro- 
cedures, about all we can conclude is that the Taylor-Russell tables will 
overestimate the amount of gain—but we cannot say by how much. 


Using the Taylor-Russell Tables (2, 3 Appendix B) with Allowance for 
Pre-existing r and Selection Ratio 


For the sake of brevity we shall use the following terms and symbols: 
The proportion of present employees considered satisfactory, we shall call 
“OK.” The proportion who will be satisfactory among those selected, we 
shall term “new OK.” The validity coefficient is r; and the selection ratio, SR. 

We hypothesize OK = .73, r = .30,5R = .40. First, let us use the tables 
without concerning ourselves about pre-existing validities or selection ratios. 

Since there is no table for OK = .73, we shall interpolate! between the 
table for OK = .70 and the table for OK = .80. In table OK = .70, for 
r = .30 and SR = .40, the new OK = .80; in table OK = .80, the correspond- 
ing new OK = .88. So by interpolating, we conclude that we may expect 
about 82 per cent of the new employees to be satisfactory instead of 73 per cent 
as at present. In other words, we may expect a decrease in the proportion of 
unsatisfactory workers from .27 to .18, which means a reduction of one-third 
in the number of unsatisfactory employees. 

Now, in addition to hypothesizing OK = .73,r = .30,SR = .40, we assume 
also a pre-existing r of .20 and SR of .60. What new OK may we now expect? 

In this case the table to use would not be three-tenths of the way from 
OK = .70 to OK = .80, as it was when we assumed zero validity and/or a 
hundred per cent selection ratio. The correct table to use would be one in 
which a change from r = 0 or SR = 1.00 to r = .20, SR = .60 would have 
brought about a new OK of .73 (which is our hypothetical present OK). 
That is, the correct table to use would be one in which the entry is .73 at the 

oint where the row r = .20 intersects the column SR = .60. However, there 
18 = table in which this occurs; so we shall have to interpolate between two 
tables. 

In table OK = .60, at the point where row r = .20 intersects column 
SR = .60 the entry is .65; in table OK = .70, the corresponding entry is .75. 
Since .73 is eight-tenths of the way from .65 to .75, to find the new OK for 
other validities and selection ratios we shall interpolate at a point eight-tenths 
of the distance between the value found in table OK = .60 and that found in 
table OK = .70. For our hypothesized r = .30 and SR = .40, the entry in 
table OK = .60 is .71; in table OK = .70, the corresponding entry is .80. 
The new OK we may expect then is about .78, not .82 as we had inferred from 
an uncritical use of the tables. In other words we may expect a reduction of 


1 Linear interpolation gives a sufficiently close approximation. 
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19 per cent (5/27) in the number of unsatisfactory employees, not a reduction 
of one-third. 


There is an additional consideration which frequently interferes with 
accurate estimation from the Taylor-Russell tables, though this time in 
the direction of underestimation. The validity coefficient with which one 
is supposed to enter the tables is that prevailing among the entire range 
of applicants. But very seldom will criterion measures be available for 
all applicants. If (as the tables assume) the applicant group and the 
present employee group are similarly constituted, the validity coefficient 
among the present employee group may be substituted. But if (as is 
probably more often the case) the present employee group is more homo- 
geneous than the applicant group, the substituted r will ordinarily be 
smaller than the r among all applicants. And so we have an additional 
element of inaccuracy in using the Taylor-Russell tables. 


Comment 


In the present state of development of vocational selection the Taylor- 
Russell tables are of value primarily in drawing attention to the impor- 
tance of utilizing the selection-ratio concept rather than in furnishing 
dependable specific probabilities of expected improvement in effectiveness 
of selection. If comprehensive factor studies like those conducted by 
the Army Air Forces (1) ultimately result in relatively pure tests of im- 
portant factors and if reliable criteria can be obtained, industrial psycholo- 
gists may perhaps be able to build up reasonably valid predictive equa- 
tions based on multiple indicators. Then they may be in a position to 
make effective use of theoretical distributions and tables like Taylor- 
Russell’s. 

Meanwhile it would seem that, for vocational selection, the realistic 
way to make the best use of the relationships shown in a predictive scat- 
tergram is to compare analytically the distributions of the individual 
arrays within the scattergram, so that optimal critical scores can be set. 
(Statistical analysis of variance will help distinguish between chance 
and real significance.) Then, if possible, enough applicants should be 
tested so that there will be a sufficient number with scores above (or 
between) the critical points to fill all vacancies. 
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Spatial Relations Ability and Other Characteristics of 
Art Laboratory Students 


William B. Dreffin and C. Gilbert Wrenn 
University of Minnesota 


One characteristic reaction of the veteran population in college and 
university today is the demand for vocational preparation. As a result 
of this need it has been the growing conviction of the University of 
Minnesota General College that the curriculum should offer not only a 
broad base of general education but also an increasing number of vocation- 
ally oriented sequences designed for sub-professional work (3). A con- 
clusion of this survey of Minnesota’s educational needs was that in many 
vocational areas there are five sub-professional jobs to each profes- 
sional one. 

This paper is concerned with the selection phase of a proposed voca- 
tional sequence in commercial art. It is an attempt to determine some 
of the criteria that could be used in the selection of students for such a 
training sequence. 

It is believed that some of these criteria may be sought in the spatial 
relations ability of art laboratory students, in their scholastic aptitude, 
and in their vocational interest profiles. Barrett (1) found that the 
Revised Minnesota Paper Form Board Test, Series BB, the Strong 
Vocational Interest Blank, women’s form, the Meier Art Judgment Test, 
and the Allport-Vernon Scale of Values all discriminated between art 
majors and non-art students at Hunter College. These same tests are 
suggested for the identification of art ability in the article on that subject 
in Kaplan’s Encyclopedia (2, pp. 59-63). Welch found that a test of 
creative thinking differentiated sharply between professional artists and 
college students (5). Maturity may have been a factor here. 

The following tests and inventories, regularly administered to all 
General College freshmen, were available for evaluation: ACE Psycholo- 
gical Examination, 1937 Form; Ohio State University Psychological Test, 
Form 22; and the Strong Vocational Interest Blank, both men’s and 
women’s forms. To determine spatial relations ability the Revised 
Minnesota Paper Form Board Test, Series MA, was administered to all 
art laboratory students. 

The sample studied consisted of 69 art stndents in the art laboratory 
course of General College (43 men, 26 women) who had a definite voca- 
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tional objective in commercial art. Specific vocational goals included 
: within this broad area were architecture, interior decorating, designing, 
| illustrative advertising, and the teaching of art. Because the manual 
: of the Paper Form Test does not give norms for commercial artists or 
: architects the authors administered this test to a group of advanced 
: architectural students and to a group of advanced commercial art 
students (Art Education) in the University of Minnesota. This was 
done upon the assumption that spatial relations ability is one factor in 
the aptitude for art and that these students would possess this ability to 
a relatively larger degree than corresponding individuals in other curricula 
or in the general population. 


Spatial Relations Ability 


Table 1 summarizes the basic statistics of the groups studied with 
regard to spatial relations ability. 


Table 1 


Means and Standard Deviations of the Specific Groups Studied and of the Published 
Norm Groups on the Revised Minnesota Paper Form 
Board Test, Series MA 


Group N 


This Study— 
Art Laboratory Students 
Commercial Art Students 
Architectural Students 


Published Norms (original form)— 
General Population 100 
Liberal Arts Freshmen 247 
Fifth Year Engineers 238 


Table 2 summarizes the significance of the differences, in terms of 
variances and means, between the art laboratory students and liberal 
arts freshmen, commercial art students, and architectural students. 

As a group, art laboratory students are significantly superior to the 
average liberal arts freshman, are not significantly different from the 
average commercial art student, and are inferior to the average architec- 
tural student in spatial relations ability. 

The authors were not only interested in determining the spatial rela- 
tions ability of art laboratory students, but also in ascertaining whether 
there was a significant relationship, of the spatial relations ability to art 
achievement and to probable professional success. Each student in the 
art laboratory was assigned two grades at the end of the quarter in which 
this study took place. One was a measure of his art laboratory achieve- 


Mean 

69 45 7.2 

29 46 4.3 

27 51 7.0 

31 11.5 

38 8.5 

46 8.0 
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Table 2 


Significance of the Difference in Terms of Variances and Means between the Art 
Laboratory Students and Liberal Arts Freshmen, Commercial Art 
Students, and Architectural Students, Respectively 


“+” or “d” Test of 


“F” Test of Variances Means between 
between Art Lab- Art Laboratory 
oratory Students Students and 

Group N and Other Groups Other Groups 
Liberal Arts Freshmen 247 F = 1.38 P> .05 t=7.22 P< .0l 
Commercial Art Students 29 F = 2.83 P< .01 d= .69 P> .05 
Architectural Students 27 F =2.39 P< .0l d=2.29 P< .0l 


When P < .01 = significant difference at the one per cent level of confidence. 
When P > .05 = non-significant difference at the five per cent level of confidence. 


ment and the other was a measure of the instructor’s considered judgment 
of that student’s potential success in a commercial art vocation. Each 
student worked independently during the quarter in the area in which 
he felt himself most deficient and the instructor’s course grade was in 
terms of individual improvement in this area. Because of this it might 
be expected that the correlation between spatial relations ability and 
the grades in art laboratory would be lower than the correlation of spatial 
relations with the instructor’s judgment of the student’s probable pro- 
fessional success. This was not the case, however, for the two coefficients 
were found to be .37 and .36 respectively, both significant at the five 
per cent level. 


Scholastic Aptitude 


It was found that art laboratory students as a group are average in 
scholastic aptitude when compared to General College students as a whole, 
but, in terms of all university freshmen, they are considerably below 
average. Their mean score on the ACE Psychological Exam. and on 
the Ohio State Intelligence Test placed them at the 45th and 50th per- 
centiles respectively in terms of General College norms, and approxi- 
mately at the 4th and 16th percentiles respectively in terms of university 
freshmen. This places them at the average of the general population in 
intellectual ability as measured by these tests. 


Measured Interests 


When the interest profiles of the art laboratory students on the Stroug 
Vocational Interest Blank were analyzed by sex it was found: 


1. That 12 per cent of the men scored B+ or A on the Artist scale 
and 14 per cent on the Architecture scale. 
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2. That 66 per cent of the men scored B+ or A on one or more of the 
business contact scales. 

3. That 60 per cent of the men scoring B+ or A on one or more of the 
business contact scales scored B— or lower on one or both of the Artist 
and Architecture scales. 

4. That 46 per cent of the women scored B+ or A on the Artist scale 
of the women’s form. 


Strong (4, pp. 685, 699, 716) reports that 48 per cent of his male artist 
norm group were commercial artists, and he found negative correlations 
between the interests of male artists and the interests of men scoring 
high in the business contact area (r’s from —.27 to —.52). These results 
conflict with the interest scores of the male art laboratory students and 
there appear to be two possible interpretations: 


1. The art laboratory students are younger than the commercial 
artists of Strong’s group and as they become older their interests may 
develop in the direction of the commercial artist pattern. This is not 
likely in view of Strong’s work on changes of interest with age. 

2. Art laboratory students, because of the low artist and high business 
contact interest scores, and because of relatively low scholastic aptitude, 
may enter commercial art but at a low professional level. Of course the 
two explanations are not mutually exclusive. They may become com- 
mercial artists but be most occupied with the business end of the vocation. 


Almost one-half of the art laboratory women of the General College 
sample secured A or B+ ratings in the Artist scale of the women’s form 
of Strong’s Inventory. These women had commercial art as an objective 
but one-half had the interest pattern of women artists in general. Only 
20 per cent of the norm group, according to Strong, were commerciai 
artists. Their interest patterns are suggestive of art as a vocation (insofar 
as interests are concerned) but the number who will find commercial art 
their field is left in question. 


Commercial Art Curriculum 


There are certain implications for the curriculum to be noted here. 
Art laboratory students in the General College at the University of Min- 
nesota as a group have spatial relations ability equal to that of advanced 
commercial art students. Furthermore, when art laboratory students 
from the General College take advanced work in Art Education (com- 
mercial and teaching fields) they are reported by their instructors to 
achieve satisfactorily in art courses. As a group, however, art laboratory 
students have difficulty in competing with commercial art students in 
other academic fields. The reason for this is apparent in their compara- 
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tively low scholastic aptitude test scores. The lack of measured ap- 
propriate interests among the men may also be a factor. 

It would appear feasible, therefore, to set up a commercial art sequence 
in the General College for students possessing spatial relations ability 
well above the average of liberal arts freshmen, with scholastic aptitude 
at about the median or above of General College students, and who have 
a definite vocational goal in commercial art. With regard to measured 
interests the findings of this study are less definitive. Two-thirds of the 
men art students have measured interests similar to the interests of men 
in business contact vocations; approximately one-half of the women art 
students, however, have measured interests similar to women artists. 
Both groups have reasonable prospects of entering the commercial art 
field at some undetermined level. 
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Notes on the Validity of the Grove Modification of the 
Kent-Shakow Industrial Formboard Series 


Ruth C. Wylie 
Connecticut College 


All the published data on the Modified Kent-Shakow Formboard 
Series have concerned reliability, discriminative capacity and norms 
rather than the validity of the test..**4 The Scovill Manufacturing 
Company and the Cincinnati Employment Center reported that the 
original Kent-Shakow Series was useful in selection of toolmaker appren- 
tices and in discriminating among occupational groups.5* Unpublished 
data made available to the writer by Grove indicate that the Modified 
Formboard Series was found to discriminate sharply between a carefully 
selected group of graduate engineers and a group of college men matched 
for ACE scores but lacking in mechanical interests or experience. How- 
ever, no reports have been published on attempts to make predictions of 
vocational success with the Modified Formboard Series. Nor have cor- 
relations of this test with other more commonly used tests of mechanical 
ability been reported. 


Procedure and Results: Job Prediction Study 


Fifty-five men who were entering the Diesel Engine School of the 
United States Submarine Base at New London, Connecticut were tested 
with the Grove Modification of the Kent-Shakow Industrial Formboard 
Series and the Spatial Visualization Factor Test from Thurstone’s Tests 
of Primary Abilities. (These men had been chosen for this special training 
partly on the basis of the Navy Mechanical Aptitude Test and General 
Classification Test. They were a highly selected group, as inspection 
of Table 1 will reveal. 

1Grove, W. R. Modification of the Kent-Shakow Formboard Series. J. Psychol., 
1939, 7, 385-397. 

2 Wylie, R.C. The reliability of the Grove Modification of the Kent-Shakow Form- 
board Series. J. appl. Psychol., 1947, 31, 155-159. 

* Wylie, R. C., Wilson, A. W., and Grove, W. R. High school norms for the Grove 
Modification of the Kent-Shakow Formboard Series. J. appl. Psychol., 1948, 32, 
41-50. 

* Wylie, R. C. The performance of girls and women on the Grove Modification of 
the Kent-Shakow Formboard Series. J. Psychol., 1948, 25, 99-103. 

Bingham, W.V. Aptitudes and aptitude testing. New York: Harper, 1937, p. 138 f, 

* Paterson, D. G., Schneidler, G. G., and Williamson, E.G. Student guidance tech- 
niques. New York: McGraw-Hill, 1938, pp. 229-233. 
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Table 1 

Navy Mechanical Aptitude Test Scores and General Classification Test Scores for 

45* Men Entering Diesel Engine School at the United States 
Submarine Base, New London, Conn. 


Navy Mechanical Aptitude General Classification 
Test Test 


Score** Score** 


Range Rating*** N Range Rating*** N 

65-71 High 12 65-77 High 11 

55-64 Above 27 55-64 Above 25 
average average 

45-54 Average 4 45-54 Average 9 

35-44 Below 1 35-44 Below 0 
average average 

13-34 Very 1 21-34 Very 0 

low low 


* Of the 46 men who graduated, records on Navy Mechanical Aptitude Test and 
General Classification Test were available for only 45. 

** Navy Standard Scores (Mean = 50;8.D. = 10). 

*** Ratings assigned to Navy Standard Scores on Navy Norm Sheets. 


Forty-six of the fifty-five entering candidates graduated from the 14- 
week course. Nine of the original group were transferred to other special 
schools or duties and did not graduate from Diesel Engine School. As 
far as can be ascertained, their transfer was occasioned by reasons other 
than likelihood of unsatisfactory performance in Diesel Engine School. 
Therefore their records were eliminated from the study rather than being 
included as failures. 

A final grade for each of the-46 graduates was made available to the 
writer. This final grade was an average of partial grades given for 
theoretical and laboratory work in such subjects as: General Motors and 
Fairbanks Morse Engines, Hydraulics, Refrigeration and Air Condi- 
tioning Equipment. 

The correlation between Navy Mechanical Aptitude Test Scores and 
final grades was +.53+.07. The correlation between the General Clas- 
sification Test and final grades was +.18+.09. The Modified Form- 
board Series correlated with final grades +.47+.07. 

Thus it is seen that the Modified Formboard Series predicted Diesel 
Engine School grades with a degree of efficiency not statistically signifi- 
cantly different from the Navy Mechanical Aptitude Test. (Critical 
ratio for the obtained difference =.4.) The Modified Formboard Series 
predicts Diesel Engine School grades somewhat more efficiently than 
does the General Classification Test. The critical ratio for the obtained 
difference was 1.6. 
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It seems reasonable to expect that the correlations for both the Navy 
Mechanical Aptitude Test and the Modified Formboard Series would be 
higher if a less highly selected group of subjects had been used. The 
range of talent for our group was restricted, not only on the Navy Me- 
chanical Aptitude Test, but also on the Modified Formboard Series. The 
median submariner’s score on the Modified Formboard Series fell at the 
90th percentile of the high school standardization group; and only five 
of the submariners had formboard scores less than the 50th percentile 
of the high school standardization group. These five scores had the 
following percentile ranks based on the high school norms: 45, 45, 40, 
40, 10. 


Table 2 
Correlations Obtained between the Grove Modification of the Kent-Shakow Formboard 
Series and other More Commonly Used Tests which Apparently Involve 
Mechanical Ability and the Spatial Visualization Factor 


Test with which 
Formboard Series 
was Correlated N Group r 
Navy Mechanical 45 Submariners 55 
Aptitude Test entering Diesel 
Engine School 
Thurstone Space 55 Submariners 49 
Factor Test entering Diesel 
Engine School 
Minnesota Paper 215 Boys, grades 9-12 58 
Formboard, in formboard 
group 
Wechsler Block 242 Boys, grades 9-12 56 
Designs Test in formboard 
standardization 
group 
Minnesota Mechan- 200 Adult, male, 63 
ical Assembly white penitentiary 
Test, Long Form inmates 


Correlations of the Modified Formboard Series with Other Tests 


In a previous article on the Modified Formboard Series’ it was sug- 
gested that this test would probably turn out to be a measure of certain 
aspects of “mechanical ability,” particularly the spatial visualization 


™Wylie, R. C. The reliability of the Grove Modification of the Kent-Shakow 
Formboard Series. J. appl. Psychol., 1947, 31, 155-159. 
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factor which has been found to be important for success in mechanical 
occupations. In this connection it is interesting to inspect available 
correlations between the Modified Kent-Shakow Series and certain other 
more commonly used tests. 

Table 2 gives correlations obtained on various groups between the 
Modified Formboard Series and commonly used “mechanical ability” 
tests. 
Table 3 gives correlations between the Modified Formboard Series 
and certain commonly used tests which are not usually considered to be 
“mechanical ability” or “space factor” tests. 


Table 3 
Correlations Obtained between the Grove Modification of the Kent-Shakow Form- 
board Series and Some Commonly Used Tests of “‘General 
Intelligence” and “Scholastic Aptitude” 


Test with which 
Formboard Series 
was Correlated N Group r 
Navy General 45 Submariners 18 
Classification Test entering Diesel 
Engine School 
Total Mental 208 Boys, grades 9-12 .33 
Factors, Cali- in formboard 
fornia Test of standardization 
Mental Maturity, group* 
Short Form 
Verbal Score on 74 College Women** .33 
Scholastic Apti- 
tude Test given by 
College Entrance 
Examination Board 
Math. Score on 74 College Women*** .29 
Scholastic Apti- 
tude Test given by 
College Entrance 
Examination Board 


* This group seemed to be comparable in range and distribution of scores to the 
group used in standardizing the California Test of Mental Maturity. 

** On the whole this group was fairly highly selected for verbal aptitude: percentile 
rank of the median score of this group = 72; however, the percentile ranks of the lowest 
and highest scores made by this group were 4 and 98. 

*** This group was not as highly selected for math aptitude as for verbal aptitude: 
the percentile rank of the median score of this group = 52; the percentile ranks of the 
lowest and highest scores made by this group = 7 and 98. 
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It is not strictly justifiable to compare sizes of correlation coefficients 
obtained on groups differing greatly in range of talent on the Formboard. 
Neverthless the trend toward higher correlations in Table 3 suggests 
that the Modified Formboard Series does have more in common with 
“mechanical ability” tests, especially tests which are apparently loaded 
with the space factor, than it does with so-called “scholastic aptitude” 
or “general intelligence” tests which do not contain, or are not so heavily 
weighted with items requiring space visualization. 


Summary and Conclusions 


It has been shown that: (1) the Grove Modification of the Kent- 
Shakow Industrial Formboard Series predicted success in training on a 
mechanical job in the United States Navy with efficiency comparable to 
the Navy Mechanical Aptitude Test. The correlation between the 
Modified Formboard Series and final grades in Diesel Engine School 
for a highly selected group of candidates was .47; (2) for several different 
groups of subjects, the Modified Kent-Shakow Formboard Series cor- 
relates .5 to .6 with certain other “mechanical aptitude” tests which 
apparently involve the spatial visualization factor; (3) Pearson r’s of the 
order of .2 to .3 have been obtained between the Modified Kent-Shakow 
Formboard Series and “general ability” tests or tests apparently involving 
verbal and mathematical factors to a much greater degree than the 
spatial visualization factor. 
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A Study of Two Techniques of Measuring 
“Mechanical Comprehension”’ * 


W. T. McElheny 
State University of Iowa 


The personnel manager, the vocational counselor, the psychologist, 
and others interested in the selection and placement of persons in work of 
a mechanical nature have long been concerned with the problem of deter- 
mining which of a multitude of mechanical tests should be used for pre- 
diction purposes. One of the primary issues has been whether to employ 
paper and pencil tests or performance tests. The former are the easier 
to give, but there has always been the suspicion that the latter might 
possibly be of more value in predicting success in mechanical occupations. 

This study was designed to provide some indication of the relation be- 
tween responses on performance or apparatus tests and paper and pencil 
devices which, it is assumed, were developed to predict essentially the 
same criterion measures. According to Bennett and Cruikshank (2), 
these are measures of “mechanical comprehension’’—the “understanding 
of principles and relationships underlying mechanical operations” (p. 2). 

The Bennett Test of Mechanical Comprehension, Form AA (1) has 
been chosen as representative of the paper and pencil instruments. The 
Purdue Mechanical Assembly Test (3) represents the assembly type test. 
The assumption is made that these two devices are attempting to measure 
essentially the same phenomena but by two quite different testing media: 
paper and pencil, in the case of the former, and assembly, in the case of 
the latter. That this assumption is a reasonable one may be seen when 
both tests are examined in closer detail. In this study, then, the concern 
is with the relationship between performance on the two tests. Do both 
tests, in spite of their differences in form and method, yield much the 
same rank order of performance? 

The Bennett test (1) was devised to measure the capacity of an indi- 
vidual to understand various types of physical relationships. It contains 
sixty pictorially presented mechanical problems, and since no mathema- 
tical or arithmetical computations are required and the verbal or reading 
element is reduced, it is Bennett’s claim that the effect of training and 
formal knowledge is minimized (2, p. 39). 

*From unpublished M.A. thesis, Department of Psychology, State University of 
Towa, August, 1948. 
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The Purdue Mechanical Assembly Test as designed by Graney (3) 
consists of nine problem boxes of equal floor area. In «sch box a mecha- 
nism may be assembled in such a way that a mechanical action takes 
place. The first subtest serves as an introductory unit to acquaint the 
examinee with the nature of the task to be performed. Each of the sub- 
tests, when properly assembled, constitutes a mechanism which can be 
manually operated from the outside of the box by a crank or push-bar. 
The test was constructed along the broad outlines of the Stenquist Me- 
chanical Assembling Test and the Minnesota Mechanical Assembly Test 
but with the hope of eliminating certain defects present in them. The 
Purdue test has eliminated stereotyped mechanical contrivances in favor 
of new and novel mechanical problem situations, thus ruling out the 
effect of chance familiarity with the task at hand for some persons, 
regardless of age or experience. As Graney (3) has pointed out, however, 
the fact that each sub-test employs in its assembly standard mechanical 
items such as levers, links, cams, gears, pinions, etc., tends to make the 
solution simpler for experienced mechanical workers as opposed to me- 
chanically naive laymen. The principles of assembly in each test are, in 
most instances, drawn from standard design practices. 

Further, the Purdue test is built of parts which are relatively large 
and strong and which must be constructed by skilled craftsmen. While 
this tends to increase the test in bulk, weight, and cost, it likewise tends 
to keep the test in practically the same condition for all test subjects. 

Certain deviations from the Purdue Mechanical Assembly Test as 
designed by Graney (3) were made in the apparatus used in the present 
study. Minor modifications in the construction of two of the tests were 
made, and the subjects were provided with a small screw driver (and 
informed of its purpose) for use on these two tests. Graney’s original 
design does not require the use of such a tool. The results presented here 
are based on scores made on seven boxes rather than the eight boxes 
which constitute the complete Graney test. In this study the assump- 
tion is made that these changes in apparatus do not constitute significant 


changes in the original Graney test. 


Subjects and Experimental Procedure 


Subjects. A statistical analysis has been made of scores achieved by 
100 college students at the State University of lowa, to whom both the 
Bennett test and the Purdue assembly test were administered. This 
group consisted of eighty male and twenty female subjects, ranging in 
age from eighteen to forty-four, representing every school classification 
according to class, and twenty-two fields of specialization. The male 
group had a modal age of twenty-four, a modal school classification of 
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college sophomore, and majors in the College of Commerce made up the 
highest frequency group. Of the female subjects, whose modal age was 
twenty-one and school classification was college senior, majors in psy- 
chology predominated. Without exception, the cooperation and effort 
of the subjects were considered to be extremely good. 

Experimental Procedure. The Purdue Mechanical Assembly Test 
was administered individually, by the writer, to each of the 100 subjects. 
Each of the sub-tests was timed and scored separately; timing was to the 
nearest half-second. The time required on each of the sub-tests was 
totalled separately for Forms A and B. These two sub-totals were then 
combined to yield a total time score for the entire test which was rounded 
to the nearest half-minute. Timing was begun when the subject removed 
the top from the box and continued until his assembly was completed. 
In the event the subject failed to complete the assembly within the 
maximum time limit, his score would be that maximum. In such a case, 
the examiner demonstrated the correct assembly of that box and then 
the subject passed immediately to the next. The maximum time limit 
for each of the boxes is given in Table 1. 


Table 1 


Maximum Time Limits for Problem Boxes in the Purdue 
Mechanical Assembly Test 


Box Time Limit (in minutes) 

A-l 5 

A-2 10 

A-3 15 
Total Form A 30 

B-1 5 

B-2 10 

B-3 15 

B-4 20 
Total Form B 50 
Total for Test 80 


The Bennett Mechanical Comprehension Test, Form AA, was admin- 
istered and scored according to prescribed instructions. No time limit 
was imposed. In every case, it was given after the subject had taken the 
assembly test at some previous time. The time between the administra- 
tion of the two tests varied from one day to two months. 

The correlations of the two tests have been computed for the total 
group and for men and women separately. Estimates of the reliability 
coefficient of each test are also provided. 
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Results and Discussion 

Results. The correlations of the two tests, together with the means 
and the standard deviations of the scores, are presented in Table 2. 

It is immediately apparent that there is a significant difference in the 
mean scores made by men and women on both tests. The correlation 
coefficient of .70 between the two tests is based on scores achieved by 
both sexes combined and must, therefore, be considered as spuriously 
high. Scores made by the male subjects alone yield a coefficient of cor- 
relation of .63. In the present study, the Kuder-Richardson formula 
for computing test reliability (4) gave a coefficient of reliability of .86 for 
the Bennett test. When the scores made on the three tests in Series A of 
the Purdue assembly test were correlated with scores on the first three 
tests in Series B, the obtained coefficient was .57. The correlation be- 
tween scores made on Series A and those made on all four tests in Series 
B was .58. From these data, Spearman-Brown estimates of the relia- 
bility coefficients were .73 and .76, respectively. Scores achieved by the 
group of eighty male subjects only were used in estimating the above 
reliability coefficients. Thus, it is seen that the obtained correlation be- 
tween the two tests (for male subjects) is about .18 lower than the geo- 
metrical mean of their respective reliabilities. 


Table 2 


Correlations between the Purdue (P) and Bennett (B) Tests: 
, Means and Standard Deviations of Scores 


Group N r Mp oP Ms 7B 
Males 80 .63 52.10 12.46 45.45 8.41 
Females 20 40 68.82 7.74 34.00 6.67 
Males and Females 100 .70 55.44 13.45 43.16 9.30 


The lower correlation (r=.40) between the tests for female subjects 
would seem to be a result of the greater homogeneity of this group. In 
spite of this lower correlation in the case of female subjects, the standard 
error of estimate of both tests is less for women than for men. The error 
of estimate on the Bennett test (from scores on the Purdue test) is 6.14 for 
female subjects and 6.56 for males; the Purdue test shows a standard 
error of estimate (from the Bennett scores) of 7.12 and 9.72, for women 
and men, respectively. 

Discussion. The correlation obtained in this study between scores 
made by male subjects on the Bennett test and the Purdue assembly test 
indicates that in spite of the difference in form of the stimuli presented 
to the subjects, both tests are eliciting responses which lead to much the 
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same rank order. While this finding suggests that a common element is 
being measured by both tests, the difference between the obtained and 
the ‘‘maximum”’ correlation is sufficiently great so that the variation in 
the method should be considered as an additional factor. 

The obtained coefficients of reliability are in close agreement with 
those which have been reported previously. Other studies (1, 3) have 
yielded reliability coefficients of .84 for the Bennett test and .77 for the 
Purdue test. 

From the standpoint of employability and administration, it is obvious 
that the Bennett mechanical test has many advantages over the Purdue 
assembly test. Among these are: (1) a large number of subjects may 
easily be examined simultaneously, (2) instructions for taking the test are 
self-explanatory, (3) test materials are readily available, and (4) a shorter 
time is required for taking the test. The significance of this latter ad- 
vantage can hardly be overemphasized. Using the average time re- 
quired by both men and women on the tests (fifty-five minutes for the 
Purdue and twenty-five minutes for the Bennett), the ratio of the time 
required for one subject is approximately two to one. If it were desired 
to test fifty subjects, and if only one set of the assembly apparatus were 
available, the ratio would become 100 to one! 

Before one decides which of two tests should be employed in a battery, 
however, he must answer the critical question of the extent to which each 
of them correlates with some outside criterion. At the present time there 
are no studies known to the writer which provide an estimate of the 
correlation of these two tests with the same criterion for strictly com- 
parable groups.' In the standardization of the Bennett test, it was found 
(1) that the test has a correlation of .50 with the average grades from 
technical military courses. Graney (3) reports a correlation of .51 be- 
tween his test and industrial merit ratings of ninety-one machinists. 
Correlation with the combined ratings of six instructors of forty-eight 
apprentice machinists was .34. If further studies should demonstrate 
that the tests show equivalent correlations with the same criteria for 
comparable groups, then one would be justified in using the more easily 
employed Bennett paper and pencil test alone. An exception to this 
might judiciously be made, however, if there is reason to believe that 
the first test has been invalidated in any way during its administration. 
In such a case, the assembly test might serve as an excellent further check 
on the aptitude of the subject under consideration. Since these con- 

1It hardly need be pointed out that all too frequently in vocational guidance, 
selection and placement of employees, etc., it is necessary to assume that since a test 
has been shown to correlate significantly with performance on one type of job that it 


will correlate with another which is considered, a priori, to be a related job. In other 
words, there is an urgent need for the experimental determination of job families. 
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clusions are based on a college population, further investigation will be 
necessary to determine whether the same relationship holds for particular 
occupational groups. 

Summary 

It was the purpose of this investigation to study the relation between 
performance on two tests, both of which purport to require the ability to 
reason about mechanical principles, but which use different testing 
media—one being a paper and pencil test and the other a performance 
test. The Bennett Test of Mechanical Comprehension, Form AA, was 
selected as representative of this first type test. The Purdue Mechanical 
Assembly Test was chosen as the performance test. 

The Purdue test was designed and constructed to be similar in prin- 
cipal to the Stenquist Mechanical Assembling Test and the Minnesota 
Mechanical Assembly Test. It embodies certain characteristics, e.g., 
sturdy, precision construction and non-stereotyped problem situations, 
which appear to be improvements upon those which were its prototypes 
(3). Eight sub-test problems compose the test, and they are divided 
into two forms of four sub-tests each. The sub-test order of administra- 
tion is, in each form, from the simple to the complex. Only seven sub- 
tests have been used in the present study—three in Form A and four 
in Form B. 

Both the Bennett test and the Purdue test were administered to 100 
college students at the State University of Iowa. The group consisted 
of eighty men and twenty women, whose ages ranged from eighteen to 
forty-four, who represented every school classification and twenty-two 
fields of specialization. In every case the Purdue test was administered 
first, and instructions given to the examinees were uniform. 

Scores made by the eighty male subjects on the two mechanical tests 
were found to correlate .63 with one another. This correlation, while 
about .18 lower than the maximum correlation as estimated from the 
reliabilities of the two tests, indicates that in spite of the difference in 
form of the stimuli presented to the subjects, both tests are eliciting re- 
sponses which lead to much the same rank order, at least in this type of 
population. It is pointed out, however, that the difference between the 
obtained and the “maximum” correlation is sufficiently great so that the 

variation in the method should be considered as an additional factor. 
The findings of this study seem to justify the conclusion that if future 
studies should demonstrate equivalent and comparable correlations be- 
tween each of the tests and some outside criterion, then the more usable 
Bennett test may safely be employed instead of a performance test in 
guidance, selection, and placement. It is suggested that the assembly 
test might furnish a further chec on a person’s aptitude, if such is needed. 
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It has been emphasized that while such a relationship may hold for a 
college population, further investigation is needed to determine if the 
same would be true for particular occupational groups. 


Received May 10, 1948. 
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Norms for the Test of Mechanical Comprehension 


Clifford E. Jurgensen 
Minneapolis Gas Company 


Applied psychology textbooks and publishers of psychological tests 
frequently advise users to establish their own test norms. The assump- 
tion is made that norms established for a specific type of work within a 
single company are more useful to that company than are more general 
norms. 

Companies which do establish their own norms are frequently con- 
fronted with the fact that such norms differ considerably from those 
published in test manuals. Although the company norms may be more 
useful to that company than test manual norms, it is nevertheless im- 
portant to know to what extent these differ from the published norms. 
Such information is useful in determining, for example, whether applicants 
of the company are superior, equal to, or inferior to applicants of other 
companies (published norms). 

In other cases, such as when tests are used for guidance purposes, it 
may be undesirable or impossible to develop usable local norms. In such 
cases it is important to know whether or not the norms published in the 
test manuals are consistent with those obtained by other workers. 

As is the case with test norms, data on intercorrelations of one test 
with others are also needed. 

Normative and intercorrelational data are published here on the 
Test of Mechanical Comprehension, Form BB.' Data were obtained 
from applicants for mechanical work consisting of (1) installing or re- 
pairing gas main or pipe, (2) installing, adjusting or repairing gas appli- 
ances such as ranges, refrigerators, water heaters, house heaters, etc., (3) 
repairing meters (soldering, sheet metal work, etc.) and (4) miscellaneous 
mechanical occupations such as electrician, welder, machinist, stationary 
firemen, auto mechanic, etc. Some applicants applied for a specific job 
and others applied for work within the broad area of “mechanical” work. 
No applicant who was hired has worked (or will work) in all of the above 
jobs, although versatility is considered desirable. In general the group 
corresponds rather closely to the typical “mechanical applicant group” 
and norms based on this group would be expected to agree with Bennett’s 

1 By Bennett, G. K., and Fry, D. E. Published by the Psychological Corporation, 
1941. 
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norms for “‘applicants for light mechanical work” if Bennett’s norms are 
adequately established and if this group does not systematically 
differ from such a standardization group. 

Test scores on 2000 cases included in this study gave a mean score 
of 28.2 and a sigma of 10.9. These agree rather closely with Bennett 
and Fry who reported? a mean of 29.1 and a sigma of 11.0. 


Table 1 
Comparison of Two Distributions of Scores of Applicants for Mechanical Work 
Bennett 
Bennett Mpls. Gas Co. and Fry Mpls. Gas Co. 
and Fry (Original (Best fitting (Best fitting 
Percentiles (Manual) score) normal curve) normal curve) 

99 54 53 55 54 
95 48 47 47 46 
90 45 42 43 42 
85 42 40 41 40 
80 39 39 38 37 
75 36 35 36 35 
70 35 34 35 34 
65 33 32 33 32 
60 32 31 32 31 
55 30 30 31 30 
50 28 28 29 28 
45 27 27 28 27 
40 26 25 26 25 
35 24 24 25 24 
30 23 23 23 22 
25 21 20 22 21 
20 19 19 20 19 
15 17 17 18 17 
10 15 14 15 14 
5 13 10 11 10 
1 6 4 3 3 
Number of Cases 435 2000 435 2000 

Score Mean 29.1 28.2 29.1 28.2 

Score Sigma 11.0 10.9 11.0 10.9 


Raw test scores corresponding to the selected percentiles used in the 
test manual were computed from the data on 2000 job applicants. 
Table 1 indicates the close agreement between the two distributions at 
all points. For the twenty one selected percentile points, eight have 
the same raw score, nine differ by only one raw score unit, two differ by 
two units, and two differ by three units. Test manual scores and those 


* Bennett, G. K., and Fry, D. E. Manual of Directions, Test of Mechanical Com- 
prehension, Form BB. New York: The Psychological Corporation, 1941. 


7 

| 

| 

| 
) 

: 


620 Clifford E. Jurgensen 


reported here were plotted on Otis’ Normal Percentile Chart? and the 
remaining percentile points were compared. Discrepancies were even 
less than for the selected percentile points due to the fact that selected 
percentile points were rounded to the closest whole number. These 
comparisons were based on original data without any smoothing of the 
curves. Smoothing of the distribution resulted in closer agreement. 

The best fitting normal curves based on the mean and sigma of the 
two distributions were calculated. Very close agreement was found be- 
tween the two curves and both agreed very closely with the selected per- 
centile points of the two original distributions. 


Table 2 
Correlation with Other Measures 


Correlation with Test 
of Mechanical 
Comprehension, 
Measure Sigma Form BB 


Age é 8.0 
Education J 1.8 
Vocabulary 5.1 
Abstraction 2. 43 
Mental Ability ‘ 11.8 
Conceptual Quotient 14.9 


Bennett and Fry have reported correlations of the Test of Mechanical 
Comprehension with various mental ability tests (College Board Ex- 
amination, American Council Psychological Examination, and Modified 
Alpha Examination). These range from .10 to .41. Correlation of the 
Test of Mechanical Comprehension with other measures were computed 
for 500 randomly selected cases from the applicant group discussed here. 
The measures used were age, education, and the various scores obtained 
on the Shipley-Hartford Institute of Living Scale.‘ The mean and sigma 
of this group on the Test of Mechanical Comprehension were 29.2 and 11.1 
respectively, and so are almost the same as for the total applicant group 
reported here as well as that reported in the test manual. Results 
(given in Table 2) show that the Test of Mechanical Comprehension is 
relatively independent of these other measures. 


Conclusions 
Norms and intercorrelations reported here are in close agreement 
with those reported by Bennett and Fry. Users of the test who cannot 


* Published by World Book Company, 1938. 
‘ Published by the Institute of Living (formerly called Hartford Retreat), Hartford, 
Connecticut. 
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develop norms for their specific situations can thus place more confidence 
in the published norms than is frequently the case. The close agreement 
between the norms given in the manual and those reported here does not 
mean that users should not devise their own norms when conditions per- 
mit. It does mean, however, that lack of such agreement (if found by 
others) should not hastily be interpreted to mean that the test manual 
norms are inaccurate. In case of such disagreement a profitable search 
might be directed toward finding out why the obtained norms are not 
comparable to the published norms. 

Received September 18, 1948. 
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Reliability of Abbreviated Job Evaluation Scales * 


David J. Chesler 
Personnel Research Institute, Western Reserve University 


The purpose of this investigation was to compare the abbreviated job 
evaluation scales that would be derived by application of the Wherry- 
Doolittle selection method (6) when all variables except the raters were 
held constant. 

In any comparative study of job evaluation systems there are at least 
three variables to be considered. These are the job evaluation manuals, 
the jobs to which the manuals are applied, and the job evaluators or 
raters. There have been few studies, if any, of abbreviated job evalu- 
ation scales in which any of these variables was held constant enough to 
permit direct comparisons of the results obtained. 

In the present study two variables were held constant throughout: 
(1) the job evaluation manual; and (2) the jobs. 

In a sense the present study may be considered a preliminary but 
basic analysis of abbreviated job evaluation scales derived by the Wherry- 
Doolittle selection method. This method was first applied to job evalua- 
tion systems by C. H. Lawshe, Jr. (2) who may be considered the “‘dis- 
coverer” of the abbreviated scale and also its chief advocate. At the 
present writing Lawshe and various associates have published four studies, 
(2, 3, 4, 5) which utilized this technique. However, neither within any 
of these studies, nor among them, were the variables held constant enough 
for direct comparisons. 

The present investigation has attempted to answer the question: 
How “reliable” is the Wherry-Doolittle selection method as applied to 
job evaluation systems? That is, given the same jobs and the same job 
evaluation manual, will the same abbreviated scales be derived with 
different raters? 

| Method 

Job raters in four industrial organizations rated independently de- 
scriptions and specifications for 35 salaried jobs on the same job evalua- 
tion manual. The jobs and the manual are the “standard jobs” and 
“standard manual’’ reported in a previous study (1), and the organiza- 
tions and raters are also the same as those reported previously. The 
standard manual was a typical point rating manual with 12 factors. 

* A condensation of a portion of a Ph.D. thesis submitted to the Graduate School 
of Western Reserve University in 1948. 
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Results and Interpretation 


Abbreviated Scales Derived from the Standard Manual. The Wherry- 
Doolittle selection method was applied to the standard manual factor 
ratings submitted independently by the raters in companies A, B, and C, 
with total point rating as the statistical criterion. This is exactly the 
same procedure followed by Lawshe and various associates (2, 3, 4, 5). 
However, the difference between the present study and those conducted 
by Lawshe is that here there was rigid control of the jobs rated and the 
manual used, that is, all raters rated the same jobs on the same manual. 

The abbreviated scales identified with different raters evaluating 
the same jobs on the same manual are presented in Table 1. 


Table 1 


Abbreviated Scales Derived from the Standard Manual by Raters in 
Three Companies Who Rated the Same Jobs 


Co. A Co. B Co. C 
Factor No. R Factor No. R Factor No. R 
1 .892 1 .896 4 .902 
4 .954 4 .946 1 .961 
5 .965 8 .972 8 .972 
8 .976 5 .980 5 .987 
3 .983 2 .990 
10 .989 


Key to factor numbers: 1. work experience; 2. essential knowledge and training; 
3. dexterity; 4. character of supervision received; 5. character of supervision given; 
8. responsibility for confidential matters; and 10. responsibility for accuracy—effect 
of errors. 


The Wherry-Doolittle technique was applied first to the data sub- 
mitted by Co. A, and it was continued until six factors had been identified. 
Carrying out the Wherry-Doolittle process to this length was an ex- 
ploratory measure to obtain an idea of the magnitude of the shrunken 
R’s that might be expected. It was decided to stop the Wherry-Doolittle 
process when the shrunken R attained a magnitude of .980. However, in 
the case of Co. C the correlations between factor 5 and the other factors 
had to be computed for another purpose, so that it was relatively simple 
to identify an additional factor, namely factor 2. 

It will be noted that the first four factors identified with each group 
of raters were the same, although the order in which they were identified 
was not the same. These four factors are ‘“‘work experience,”’ ‘character 
of supervision received,” “character of supervision given,” and “re- 
sponsibility for confidential matters.”’ Differences in the three instances 
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with respect to the order in which the factors were identified are appar- 
ently due to differences among the raters since the jobs and the job evalua- 
tion manual were constant throughout. 

If three factors are decided upon to comprise the abbreviated scale, 
then the same three factors have been identified in two out of three in- 
stances. This point is mentioned because in various studies (2, 3, 4, 5) 
Lawshe and his associates identified three factors to comprise the ab- 
breviated scale. 

Adequacy of Abbreviated Scales Derived from Standard Manual. A 
test of the adequacy of an abbreviated job evaluation scale is the degree 
to which jobs are displaced from the labor grades in which they were 
placed by the original scale. In the present study a comparison was made 
of the accuracy with which three abbreviated scales would predict the 
original values assigned to the jobs. Each abbreviated scale consisted 
of the same four factors and each was derived from the same original 
manual which was applied to the same jobs, but by different raters. In 
order to make this comparison, the three separate multiple regression 
equations for predicting total points from point ratings on “work ex- 
perience,” “character of supervision received,” “character of supervision 
given,” and “responsibility for confidential matters’? were computed. 
These three prediction equations were as follows: 


Co. A: TPsm = 1.2F; 2.3F, 1.4F; 1.6Fs + 66.0 
Co. B: TPsm = 1.4F; + 1.7F, + 2.1F; 1.2F; 57.2 
Co. C: TPsm = 2.6F, 1.3F; a 1.5Fs 1.6F; 53.3 


“TP” and “SM” indicate “total points” and “standard manual,” 
respectively. The standard errors of estimate for these three equations 
were 13.6, 12.1, and 10.9 respectively. The multiple R’s for these three 
equations were .98, .98, and .99 respectively. All of these R’s are signifi- 
cant at the one per cent level. These prediction equations were applied 
to the ratings given on the four factors by the raters in Co. A, Co. B, 
and Co. C, respectively, and three corresponding sets of predicted scores 
were obtained. 

A uniform labor grade of 25 points was adopted as the classification 
plan for the standard manual so that the comparative adequacies of the 
three abbreviated scales could be easily studied. The range of points 
for the classification plan of the standard manual was 400 points (a 
minimum of 100 points for any job, and a maximum of 500 points), so that 
there was a total of 16labor grades. The results are presented in Table 2, 
which shows the per cent of jobs in each instance which remained in the 
same labor grade or which were displaced into another labor grade. It 
will be noted that all the jobs remained in the same labor grade as the 
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Table 2 


Labor Grade Displacement for 35 Standard Jobs with Abbreviated 
Scales Derived from Standard Manual 


Co. A Co. B Co. C 
Displacement f % f % f % 
+1 12 34.3 7 20.0 5 14.3 
0 17 48.6 18 51.4 26 74.3 
—1 6 17.1 10 28.6 4 11.4 
Totals 35 100.0 35 100.0 35 100.0 


original classification or were displaced into a labor grade adjacent to that 
of the original classification. 


Another analysis of the same data is presented in Table 3, which shows 
how ratings with the abbreviated scale deviated as much as 12.5 points 


(0.5 labor grade), 25 points (1.0 labor grade), and more than 25 points, 
from total points on the original scale. In three instances 62.8 per cent, 
68.5 per cent, and 74.2 per cent of the predicted ratings deviated from 
the original ratings by 12.5 points or less. In two instances 94.2 per cent, 


* and in one instance 97.1 per cent of the predicted ratings deviated from 


the original ratings by 25 points or less. It should be noted that the 
standard errors of estimate (13.6, 12.1, and 10.9) for the three prediction 
equations are approximately equal to 12.5 points or 0.5 labor grade, indi- 
cating that about 68.2 per cent of the predicted scores would be within 
approximately the value of 0.5 labor grade of the original scores. 
Comparison of Multiple Regression Equations for Abbreviated Scales 
Derived from Standard Manual. The problem to be analyzed here is the 
similarity of the three multiple regression equations computed for use 
with the abbreviated scales derived from the standard manual. Put in 


Table 3 


Point Deviation for 35 Standard Jobs with Abbreviated Scales 
Derived from Standard Manual 


Co. A Co. B Co. C 
Point Deviation f % f % f % 
25.01 plus 1 2.9 1 2.9 on is 
12.51 to 25.00 5 14.3 3 8.6 3 8.6 
0 to 12.50 12 34.2 15 42.8 17 48.5 
0 to —12.50 10 28.6 9 25.7 9 25.7 
—12.51 to —25.00 6 17.1 6 17.1 5 14.3 
1 2.9 1 2.9 


— 25.01 minus 1 2.9 
Totals 
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practical terms, the problem may be phrased thus: How much difference 
is there among the three multiple regression equations, derived inde- 
pendently in three instances and designed to predict three independent 
sets of standard manual total scores, in predicting a fourth set of standard 
manual total scores? 

The standard manual ratings obtained in Co. D were used as the 
fourth set of ratings to which the three multiple regression equations ob- 
_ tained in companies A, B, and C were applied. That is, each of the three 
multiple regression equations was applied to the ratings assigned by the 
rater in Co. D to “work experience,” “character of supervision received,” 
“character of supervision given,” and “responsibility for confidential 
matters.’ Comparisons were then made of the predicted scores obtained 
by application of each of the three multiple regression equations and the 
total scores assigned by the rater in Co. D. 


Table 4 


Comparison in Terms of Labor Grade Displacement of Total Points Assigned by Rater 
in Company D to 35 Standard Jobs and Predicted Total Points Computed 
from Prediction Formulae of Raters in Companies A, B, and C 


Co. A Co. B Co. C 

Displacement f % f % f % 
+2 1 2.9 1 2.9 
+1 20 57.1 9 25.7 14 40.0 
0 12 34.3 16 45.7 15 42.8 
—l 2 5.7 9 25.7 5 14.3 

Totals 35 100.0 35 100.0 35 100.0 


Table 4 shows the per cent of jobs in each instance which remained in 
the same labor grade as total points assigned by the Co. D rater placed 
the jobs, and also the per cent of jobs which were displaced one or two 
labor grades when compared with the labor grade classification deter- 
mined by the rater in Co. D. It will be noted that in all three instances 
97.1 per cent of the jobs either remained in the same labor grade as the 
rater in Co. D classified them, or they were displaced into an adjacent 
labor grade. So far, this finding indicates that for practical purposes the 
three prediction formulae are very much alike. 

Table 5 shows an analysis of the same data in terms of point deviation. 
In all three instances 91.4 per cent of the jobs deviated 25 points or less 
from the values assigned to the jobs by the rater in Co. D. This con- 
firms the conclusion that for practical purposes the three prediction 
formulae are very much alike. 
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Table 5 


Comparison in Terms of Point Deviation of Total Points Assigned by Rater in 
Company D to 35 Standard Jobs and Predicted Total Points Computed 
from Prediction Formulae of Raters in Companies A, B, and C 


Co. A Co. B Co. C 
Point Deviation f % f % f % 
25.01 plus 2 5.7 1 2.9 2 5.7 
12.51 to 25.00 14 40.0 7 20.0 8 229 
0 to 12.50 9 257 12 342 heer 
0 to —12.50 7 2.0 8 229 9 25.7 
—12.51 to —25.00 2 5.7 5 143 2 5.7 
—25.01 minus 1 2.9 2 5.7 1 2.9 
Totals 35 100.0 35 100.0 35 100.0 


Summary and Conclusions 


1. The basic methodological feature of the present study was to have 
raters in various companies evaluate a standard set of job descriptions 
and specifications for 35 representative salaried jobs on a standard 
manual. The standard manual was of the point rating type and con- 
tained 12 factors. 

2. The Wherry-Doolittle selection method was applied to the standard 
manual factor ratings submitted by analysts in three companies. The 
first four factors identified in each company were the same, although the 
order of identification was not the same. These four factors were “work 
experience,” “character of supervision received,” ‘‘character of super- 
vision given,” and “responsibility for confidential matters.’”’ The first 
three factors identified were the same in two of the three companies. 
Differences in the three companies with respect to the order in which the 
factors were identified are apparently due to differences among the raters 
since the jobs rated and the job evaluation manual used were constant 
for all raters. 

3. Application of the three abbreviated scales, each containing the 
same four factors, resulted in all of the jobs remaining in the same labor 
grade as the original classification, or in being displaced into a labor 
grade adjacent to that of the original classification. In the three com- 
panies 62.8 per cent, 70.5 per cent, and 74.2 per cent of the predicted 
ratings deviated from the original ratings by the point value of 0.5 labor 
grade or less; similarly 94.2 per cent, 94.2 per cent, and 97.1 per cent of 
the predicted ratings deviated from the original ratings by the point value 
of 1.0 labor grade or less. 
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4. Application of the three abbreviated scales to a fourth independent 
set of ratings resulted in three sets of predicted scores which were, as a 
whole, very much alike, as measured in terms of labor grade displace- 
ment and point deviation. 

5. The present investigation substantiates the findings of Lawshe and 
various associates (2, 3, 4, 5) that abbreviated job evaluation scales 
justify themselves from the standpoint of technical and scientific accuracy 
and economy. In this connection, however, it may be pointed out they 
may not justify themselves psychologically, since they are liable to create 
a belief among employees that all aspects of each job have not been fully 
considered. 

Received July 12, 1948. 
Early publication. 
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A Note on Machine Scoring the Kuder Preference Record 


Louis Lauro 
The City College of New York 


The use of the Kuder Preference Record has greatly increased in re- 
cent years, particularly since it has become available in machine scoring 
form. The time required on the original hand-scoring form was 15 to 20 
minutes per paper. The IBM machine-scored answer sheet for the 
Kuder must be inserted eighteen times in the scoring machine. On a 
trial run of thirty answer sheets the scoring time was found to be 62 
minutes. This is indeed a saving in time over the hand scoring method, 
but the time can be reduced still further. 

Scores for nine different occupational areas are obtained on the Kuder. 
The score for each area is based on certain responses. Most of the re- 
sponses are counted towards the score for more than one area. However, 
no response which is counted towards the score for Area 2 (Computa- 
tional) is counted towards the score for Area 3 (Scientific). Similarly, no 
response which is counted towards the score for Area 6 (Literary) is 
counted towards the score for Area 7 (Musical). Therefore, in machine 
scoring, if a set of keys (“Elimination” and “Rights’’) is so punched that 
the score for Area 2 appears on the “Rights” circuit and the score for 
Area 3 on the “‘Wrongs” circuit, both these scores can be obtained with 
one insertion of an answer sheet. In a similar manner, if a set of keys 
is so punched that the score for Area 6 appears on the “Rights” circuit, 
and the score for Area 7 on the “Wrongs” circuit, both of these scores 
can be obtained with one insertion. All positions except those responses 
that are counted towards the score for Area 3 (or 7) are punched on one 
key (“Elimination” key) and this key is used in conjunction with a key 
(“Rights” key) on which all the responses that are counted towards the 
score of Area Z (or 6) are punched. Both “R” and “W” field selection 
holes should be punched on the elimination key for all ten fields. 
However, in practise, two scores are obtained for each interest area 
simply because the number of items on the Kuder necessitates the use of 
both sides of the answer sheet. So, scores for Area 2 and for Area 3 can 
be obtained for each side of the answer sheet by one insertion. Similarly, 
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scores for Area 6 and for Area 7 can be obtained for each side of the an- 
swer sheet by one insertion. The total number of insertions per answer 
sheet (as well as the total number of key changes) is thus reduced from 
eighteen to fourteen. The reduction in scoring time was found te be 
seven minutes for thirty answer sheets, or a saving of 11 per cent. 

The application of this principle to shorten the scoring time of the 
Strong Vocational Interest Blank is not possible since there are no two 
occupational-group scales which do not have responses in common. 


Received April 9, 1948. 
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The Use of Rating Scales and Personal Inventories 
to Check Each Other 


James D. Weinland 
New York University 


Rating scales and personality inventories have been called, “the most 
used and worst form of human measurement.”’ In regard to the first 
characteristic of being widely used there is little question. Merit ratings 
and interview ratings are familiar throughout business and industry. 
Personal inventories are almost as common as cross word puzzles. In 
regard to the second characteristic of their being the worst forms of 
measurement there might be some difference of opinion. They are both, 
however, subjective instruments and the difficulty of completely validat- 
ing either one of them has been insurmountable. 


Purpose of Study 


The purpose of this study is to demonstrate a method of improving 
the validation and use of rating scales and personal inventories by 
using them on the same individuals to check each other. 

In the early days of rating scales Hollingworth tried self ratings and 
found them to be completely unreliable. It appears that people cannot 
evaluate themselves very well on a few points only, giving relative judg- 
ments on each, as a rating scales demands. But experience indicates 
people can do what amounts to the same thing if the form is changed. 
They can, with some accuracy, answer many personal questions yes or 
no. In brief they can fill out a questionnaire or personal inventory. 

It becomes possible then to make out a rating scale for the use of 
others, and an inventory on the same attributes, to be filled out by the 
subject himself. In this case the same qualities are being measured by 
different instruments and the validity of these instruments can be checked, 
to some extent, by comparing the measurements obtained. This was 
done. 


Procedure 


A graphic rating scale and a personal inventory, with the same divi- 
sions or subheadings were constructed to measure personal efficiency. 
The inventory was administered to 57 subjects. These subjects were 
rated by three people each on the same attributes that had been measured 


631 


632 James D. Weinland 


by the inventory. The method of obtaining the ratings is one that might, 
under some circumstances, be useful elsewhere. Each subject was re- 
quested to name one of his acquaintances who knew a number of his 
other friends. The subject then wrote his name on each of the three 
rating scales, placed them in stamped envelopes addressed to the author 
of this article, and handed them to the friend named with the request that 
he distribute the rating scales to three people who could and would rate 
him, but who would remain unknown to the subject. It was stressed 
that the subject of the ratings would not try to find out who was rating 
him, and that the raters might therefore be assured of remaining uni- 
dentified. With the rating scales was a note explaining the situation 
and suggesting that fairness rather than leniency or flattery would do 
their acquaintance the most good and be the act he would consider the 
most friendly. 

When the rating scales were returned, the descriptions were replaced 
with numerals, from the highest to the lowest on each line, with the 
following values, 100—75—50—25—0. If the rater had checked in be- 
tween descriptions the average values of 87.5—62.5—37.5—12.5 were 
used. These values, it will be observed, harmonize very well with the 
percentiles obtained for the inventory. 

Each subject had been rated by three individuals so these ratings for 
each attribute were averaged, and this average charted on the psycho- 
graph. The inventory was handled by establishing norms and obtaining 
percentiles. The attributes studied carried the definitions and explana- 
tions given below. Realism; the ability to see facts and act accordingly. 
Economy; thrift, making the most of what one has-time-energy-things. 
Appreciations; interests, tastes, and hobbies. Sense of Justice; open- 
mindedness, fair play, weighs both sides of a question. Motivation; 
drive, degree of activity and amount of energy expended. Investment 
in Self; learning in and out of school. Personal Integration; goals and 
plans, degree of personal organization. Social Integration; appearance, 
manner and social skills. 

The psychograph, presented below is that of one person, subject A. 
The self estimates were obtained from the personal inventory scores and 
the ratings by the use of the graphic rating scale, described above. 

At first it might seem misleading to put on one psychograph two sets 
of data arrived at by different techniques. It is probabiy true that were 
there but one score from the rating scale and one from the inventory a 
direct comparison of them would not be legitimate. But from each 
there are a number of scores that make a pattern, and a comparison of 
the patterns may be very useful. 
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Fig. 1. Comparison of personal inventory scores with rating scale scores, 
Subject A. 


An immediate value in comparing directly the two forms of measure- 
ment is that it is often doubly reassuring to the subjects. Many people 
do not like to be rated by others only, feeling that they are not thoroughly 
understood or sympathetically judged. Some people like to know what 
others think of them. The majority of our subjects displayed as much 
interest in the comparison of the two types of scores as they did in the 
basic experience of being measured and receiving indications of their 
relatively high or low standing. 

In plotting psychographs for the fifty seven subjects some interesting 
contrasts appeared. There are individuals who (a) consistently make 
higher scores on the inventory than those given them on the rating 
scales; (b) those who are consistently rated higher by others than are the 
scores they make for themselves on the inventory; and (c) those, some 
of whose rating scores are higher, and some of whose inventory scores 
are higher. 


Correlations 


Reliabilities for the eight sections of the personality inventory ob- 
tained by the split-half method are given below in the first column of 
Table 1. Correlations between results obtained by inventory and rating 
scale methods are shown in column two of the same table. 

Examination of the various correlations suggests some of the values 
that will accrue from making the comparison of ratings and inventories. 
The highest intercorrelations are in social integration and realism. Both 
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Table 1 
Reliability of the Personal Inventory and Correlation between 
Inventory and Ratings 
Inventory Rating-Inventory 
Traits Reliabilities Correlation 
Realism 77 .70 
Economy .76 
Appreciations 81 33 
Sense of Justice .79 21 
Motivation 40 
Self Investment 31 15 
Personal Integration .29 
Social Integration .66 59 


of these characteristics are of the type that seem to permit observation 
by others. Low correlations were found in self-investment, sense of 
justice, and personal integration. These characteristics are not so ob- 
vious, nor so subject to observation by others. 

Some of our partial-inventory reliabilities can be raised and the in- 
ventory is being item analysed to improve it. But even as it stands the 
scores from the inventory throw a contrasting light on the ratings, 
particularly those of deep or hidden characteristics. Where correla- 
tions are high considerable confidence in the results may be assumed. 
Where correlations are low further work on the measuring instruments, 
both inventory and rating scales, is suggested. In the meantime, the 
subject measured is protected in that he is not given solely, either ex- 
treme score. 


Summary 


1. The use of ratings and personal inventories can be improved by 
constructing them in parallel to measure the same individuals, in the 
same characteristics, by different methods. The two instruments check 
and to some extent may be said to correct each other. 

2. Individuals so far tested responded favorably to the double 
measurement. They were happy to have the chance to protect them- 
selves, and equally glad to see how the opinions of others compared with 
their own. 

3. Low correlations between results of the two methods indicate 
further work is needed on the measuring instruments, and caution is 
suggested in the use of the data. The higher the correlations between 
the results obtained by the two instruments the greater the confidence 
that may be felt in the result. 
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4. A number of interesting clinical possibilities are indicated by the 


method; particularly that of examining personalities whose inventory 
scores are either consistently above, or below their rating scores. 
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The “Liberalism” of Congressmen Voting For and 
Against the Taft-Hartley Act 


Philip Ash 
The Pennsylvania State College 


The Taft-Hartley Act constitutes one of the most striking recent in- 
stances of division between “liberals’”’ and “conservatives.” However, 
as yet no studies have appeared that provide concrete evidence for the 
extent and character of this division. 

This paper is intended as a brief note on the problem with respect to 
congressmen as they voted on the motion to override the veto of the 
Taft-Hartley Act. 


Source of Data 


In the February 1948 issue of the Journal of Applied Psychology, 
Brimhall and Otis! reported a study of the consistency of voting of con- 
gressmen on a number of important issues over a five-year period. Using 
as source data, reports in the New Republic in which the votes of 512 
congressmen were recorded as “progressive” or “anti-progressive,’”’ they 
assigned values on a seven-point scale (from “liberal” to “conservative’”’) 
to each congressman for each of four report periods studied. 

In addition, for those congressmen for whom data were available for 
at least two report periods, including the last one (1947), they gave an 
average scale value. No congressmen were included in their study whose 
first election was to the Eightieth Congress. 

The average ratings they provided were used in the present study. 
Where no average rating was computed by them, the ratings available 
were averaged for the purposes of the present paper. 

Their tables were checked against the roll calls on the vote to override 
the Taft-Hartley veto in the Congressional Record of June 20, 1947 (for 
the House of Representatives) and June 23, 1947 (for the Senate). The 
congressmen on whom they presented ratings were then broken down into 
four groups: those who voted to sustain the veto,? those who voted to 
override, those who were absent, and those who were not re-elected to 
the Eightieth Congress. 

1 Brimhall, Dean R., and Otis, Arthur 8. Consistency of voting by our congress- 
men. J. appl. Psychol., 1948, 32, 1-14. 
2A vote to sustain is defined as “liberal”; a vote to override as “conservative.” 


It should be understood that “liberalism” as used here refers to the unvalidated judg- 
ment of the New Republic. 


“Liberalism” in Voting on Taft-Hartley Act 637 


Distribution of Votes on the Taft-Hartley Act 


The Brimhall and Otis study did not, as has been indicated above, 
include ratings for all of the 96 senators and 435 representatives who com- 
prised the Eightieth Congress. Furthermore, of the senators, only 73 of 
those listed by Brimhall and Otis were in the Eightieth Congress; 15 
were not. Of the representatives, 310 were in the Eightieth Congress, 
125 were not. Those who were not in the Eightieth Congress failed to 
stand for election, or failed to be re-elected after serving in a previous 
Congress. 

However, over 70% of the vote cast in both the House and the Senate 
is represented in the present study. In the case of both houses, a slightly 
higher proportion of the “progressive”’ vote is represented (78% in the 
House and 88% in the Senate) than of the “conservative” vote (71% in 
the House and 72% in the Senate.) 

Those who voted to sustain, both in the House and in the Senate, 
ranged from “‘liberal’’ (rating of 1 or 2) to at least ‘middle of the road” 
(rating of 4 or 5). However, in the House 87.7% of the votes to sustain 
were cast by the definitely “liberal,” while in the Senate the comparable 
figure was 81.9%. On the other hand, those who voted to override 
ranged from “liberal” to “conservative.” The definitely ‘‘conservative”’ 
(rating 6 or 7) composed only 43% of the overriding vote in the House, 
and 40.8% of the overriding vote in the Senate. In other words, clear- 
cut group differences emerge. Individuals who were rated as “liberals” 
voted overwhelmingly to sustain but substantial numbers of the “liberals’’ 
also voted to override, a “conservative” act. 

This distribution of the votes emphasizes the limited applicability 
of the rating for prediction of the voting behavior of the individual con- 
gressman for a single issue. 

In both the House and the Senate, however, the mean rating for those 
who voted to sustain was over 3 scale points lower (more “liberal’’) than 
the mean rating for those who voted to override. These differences, 
statistically significant in both cases far beyond the 1% level of confidence, 
are of the magnitude of almost half the total possible range. On a group 
basis, therefore, the Brimhall-Otis rating successfully distinguished be- 
tween the “liberals” and the “conservatives.” 

Finally, in both the House and the Senate, those congressmen rated 
by Brimhall and Otis who were not in the Eightieth Congress (did not 
stand for or lost the 1946 election) were as a group significantly more 
“liberal” than those who voted to override the veto (Table 2). In both 
cases this difference was about 2 scale points. These differences were 
also significant far beyond the 1% level of confidence. 
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Table 2 


Distribution of Congressmen Rated by Brimhall and Otis Who Did Not Stand for or 
Were Not Reelected to the Eightieth Congress 


House of 
Representatives Senate 
Rating No. % No. % 
1 42 37.2 5 33.3 
2 19 16.8 1 0.7 
3 12 10.6 2 1.3 
4 9 8.0 4 2.7 
5 14 12.4 1 0.7 
6 14 12.4 2 1.3 
7 3 2.6 
Total 113 100.0 15 100.0 
Mean Rating 2.89 3.07 
Mean Rating of 
Override Group 
(from Table 1) 4.88 5.12 
Difference 
Override—Not Elected 1.99 2.05 
t-ratio 10.6 6.6 


This finding is at least partially corroborated by the fact, noted above, 
that the vote to sustain group contained a higher proportion of holdovers 
than the vote to override group. 


Summary and Conclusions 


Using ratings of “liberalism” developed by Brimhall and Otis, a com- 
parison was made of the differences in degree of “liberalism’’ between 
congressmen who voted to sustain the Taft-Hartley veto and those who 
voted to override. In addition, those who voted to override were com- 
pared with congressmen rated by Brimhall and Otis who were not in the 
Eightieth Congress, but were in previous Congresses. 

It was found that those who voted to sustain were significantly more 
“Jiberal’”’ than those who voted to override. It was also found that those 
who failed to stand for election to, or failed in winning re-election to the 
Eightieth Congress were more “liberal” than those who voted to override. 

However, at least a small proportion of “liberals” actually voted 
“conservative” (i.e., to override); on the other hand, no “conservatives’’ 
voted to sustain the veto. This finding suggests that the “liberalism- 
conservatism” index as here developed yields less reliable predictions of 
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individual voting behavior on single issues than of group voting behavior 
on such issues or individual voting behavior on a group of issues con- 
sidered together. 

This study, like the Brimhall and Otis study on which it draws heavily, 
has for a principal reason the exploration of consistency of political 
behavior in the democratic setting. It suggests, in addition to the types 
of research outlined by Brivuhall and Otis, the need for certain methodo- 
logical studies. These would include a study of the validity of the 
“liberalism” index used, correlation analysis to determine the reliability 
or consistency of voting behavior as measured by the index, and re- 
search to determine the parameters for an equation for individual pre- 
diction. 


Received April 5, 1948. 


The Pin Prick Method of Secret Balloting 


Raymond J. Corsini 
San Quentin, California 


A common method to find out what a group really feels about issues 
which they ordinarily will conceal is to use a secret ballot, wherein the 
anonymity of the individual subjects is preserved. Most often a mimeo- 
graphed or printed sheet to which subjects need write only ‘‘yes” or ‘‘no,”’ 
or perhaps check a number of alternatives is used. 

The writer has found the use of the simple method described below 
to have some advantages over the pencil-paper method and he presents 
it for the consideration of those who at times feel the need to sample the 
attitudes of individuals who may fear disclosure or betrayal of authorship 
by possible identification of their check marks. 

The ordinary type of question sheet is prepared and instead of pencils 
being used, ordinary straight pins, or preferably toothpicks, are dis- 
tributed with directions to push one hole at appropriate places if one 
wants to answer “‘yes’” and two holes if one wants to answer “‘no’’. Or, 
if a list of alternate responses are listed, one merely punches a hole fol- 
lowing the appropriate response, or a sequence of holes if he is to list 
alternates in order of preference, etc. 

The advantages seem to be: 

1. The impression is gained that more people feel that this method is 
more secret than the vse of pencils, since a pin hole is more anonymous 
than even pencil checks. 

2. The use of pencils is not required, which are often not possessed 
by some institutional individuals, and lending them out often results in 
not getting them back. 

3. No writing surface is needed, so that this type of questionnaire can 
be used even when tables, one-armed chairs etc. are not available. 


Received May 6, 1948. 
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Distributions of Scores on the Wechsler-Bellevue Scales and 
the California Test of Mental Maturity at a 
V. A. Guidance Center 


May Herrmann and Roy B. Hackman 
Temple University 


The purpose of this paper is to report the results of an analysis of 
certain test scores of the 4500 veterans counseled at the Temple University 
Veterans’ Administration Guidance Center between September 1945 and 
November 1946. Of these 4500 veterans, 2289 were given the Wechsler- 
Bellevue Test and 571 were given the California Test of Mental Maturity. 
Staff Members at the Center were particularly interested in the results 
obtained on these tests of general mental ability. The question was 
raised in this Center as in others as to whether existing norms were ap- 
plicable to the veterans, or whether local norms should be made for use 
in veterans’ advisement centers (1). 

The mental ability test most often administered at the Temple Uni- 
versity Veterans’ Administration Guidance Center is the Wechsler- 
Bellevue. Tabulations completed at City College of New York and the 
University of Michigan confirm the popularity of this test in veterans’ 
advisement units (2,3). The reliability of the test has been the subject 
of other reports; Rabin concludes that the majority of studies show high 
correlations between the Wechsler-Bellevue and other individual and 
group measures of intelligence (4). He reports correlations with the 
Stanford-Binet (Form L) ranging from .62 in Anderson’s study of 112 
female college freshmen to .91 and .93 in Halpern’s and Benton’s studies 
of mental patients. He also quotes Lewinski’s study of 290 “‘psycho- 
pathic and subnormal” naval recruits showing r = .73 between the Kent- 
Emergency Test and the Verbal Scale of the Wechsler-Bellevue. Rabin’s 
own study of 92 student nurses on the Army Alpha 5 and the Wechsler- 
Bellevue results in r = .74. The lowest correlation reported was in 
Anderson’s study of the ACE and the Wechsler-Bellevue obtaining r’s of 
.48 and .53. It was added, however, that the Verbal Scale correlation 
was higher than the full scale. Sartain’s study concurs with these results 
(5). He tested 50 college students in their freshman year and obtained 
correlations with Wechsler-Bellevue (1941 Edition) and other tests as 
follows: Revised Alpha Examination (Form 5): .74; Otis Self Adminis- 
tering Test of Mental Ability (Form A): .70; ACE (1942 Edition): .69; 
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Stanford-Binet (Form L):.77. Watson’s article, which amplifies Rabin’s, 
states that there are fairly high correlations between the Wechsler- 
Bellevue Scales and verbal measures of intelligence, but the correlations 
with performance type scales are somewhat lower, although still sub- 
stantial. He confirms the trend reported by Rabin of relatively higher 
W-B I.Q.’s for duller subjects and relatively lower ones for brighter 
subjects (6). He describes Lewinski’s studies of 100 Naval recruits 
suspected to be mentally retarded, showing correlations of .65 and .64 
between Scale A and Scale B of the Herring Revision of the Binet-Simon 
Tests and the W-B Verbal Scale. Goldfarb’s study of 60 superior foster 
home children is also included with correlations of .86, .80 and .67 be- 
tween Stanford-Binet (Form L) and full, Verbal and Performance W-B 
Scales respectively. 


Procedures and Results 

The veterans in the group counseled at the Temple University Center 
were young; 48% between 20-24 years and 29% between 25-29 years. 
An analysis of the educational level showed that over half had completed 
high school, 44% having credit for 12 grades and 9% having more than a 
high school education; 15% had 8 grades or less and 32% had 9, 10, or 11 
grades. The group was 91% white and almost entirely male. 

The distribution of .I.Q.’s obtained for 2289 cases at the Temple 


University Center is very similar to the test norms as published by 
Wechsler (7) (see Table 1). The mean I.Q. is 101.0, the standard devia- 


Table 1 
Distribution of Wechsler-Bellevue Total I.Q.’s 


No Disability Disability 
Rating for Rating for 
Psychoneurosis Psychoneurosis 


1.Q. 


50- 69 
70- 79 
80- 89 
90- 99 
100-109 
110-119 
120-129 
130-144 


Total 


Mean I.Q. 


Standard 
Deviation 


All Cases 
= f % f % % 
57 2.5 42 2.3 15 3.6 
116 5.1 85 4.5 31 7.5 
296 12.9 235 12.5 61 14.9 
527 23.1 412 22.0 115 27.9 
656 28.6 552 29.4 104 25.3 
486 21.2 419 22.3 67 16.3 : 
141 6.2 122 6.5 19 4.6 
10 0.4 10 0.6 —_— — 
| 2289 1877 412 
PY 101.0 101.7 97.8 
1283 
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tion 13.96, and half the cases (52%) have I.Q.’s between 90 and 109. 
These results show this veteran group to be comparable to Wechsler’s 
norm group. 

Of the 2289 veterans taking this test, 412 were receiving pensions for 
psychoneurotic disabilities. As is true for all the mental ability tests 
studied at this Center, results for the psychoneurotics as a group tended 
to be lower than those for the entire sample (see Table 1). For the 
Wechsler-Bellevue, the mean I.Q. of the 412 psychoneurotics was 97.8; 
for the remainder of the distribution the mean I.Q. was 101.7. The 
C. R. of 5.10 indicates that this is a statistically significant difference. 
From the data available, the reasons for the poorer showing of the psycho- 
neurotics could not be determined (i.e., possible selective factors, etc.) 


Table 2 
Distribution of Wechsler-Bellevue Part-Test Scores 
No Disability Rating for Disability Rating for 
Psychoneurosis Psychoneurosis 
40- 69 42 2.24 46 2.45 18 4.3 14 3.4 
70- 79 109 5.81 65 3.46 37 9.0 20 4.8 
80- 89 289 15.40 216 11.50 74 17.9 58 14.1 
90- 99 469 24.99 346 18.44 115 27.9 93 22.6 
100-109 544 28.98 522 27.81 101 24.5 128 31.0 
110-119 309 16.46 497 26.48 52 12.7 77 18.7 
120-129 102 5.44 168 8.95 13 3.2 20 4.9 
130-149 13 0.69 17 0.90 2 0.4 2 0.5 
Total 1877 1877 412 412 
Mean I1.Q 99.7 103.4 96.4 99.9 
8.D. 13.68 14.37 14.14 14.16 


Part scores for the test were studied to determine whether such a 
breakdown would reveal differences between cases who have a disability 
rating for psychoneurosis and those who do not (see Table 2). Of the 
1877 cases without ratings for psychoneurosis, the Mean Verbal I.Q. 
was 99.7 and the Mean Performance I.Q., 103.4. This compares with 
respective Means of 96.4 and 99.9 for the 412 psychoneurotics. The 
data for both groups, therefore, demonstrate superiority in the perfor- 
mance phase of the test. As previously mentioned, Rabin and Watson 
believe that the verbal scale correlates more highly with the traditional 
measures of intelligence and achievement than does the performance 
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scale (4,6). On this verbal criterion, therefore, the total group involved 
in this study is of slightly below average mental ability. Wechsler him- 
self concludes that performance scores surpass verbal except for those of 
above average intelligence (7). He also classifies young psychopaths 
and behavior problems in the group scoring higher on performance tests 
(7). Levi and Weider are in agreement (8). Several investigators con- 
clude that when the verbal scale is higher there is indication of certain 
psychoses, usually schizophrenia (8). On the other hand, Gurvitz in 
studying 4500 successive admissions (851 psychopaths and 3649 non- 
psychopaths) to the Federal Penitentiary at Lewisburg, Pa., concludes 
that intelligence is not a significant factor in the diagnosis of psychopathic 
personality (9). Brown, too, showed that for 13,454 adult male admis- 
sions to the Illinois State prisons from 1930-1936 the average intelligence 
test scores were similar to those for the general population, but with a 
more heterogeneous distribution including a disproportionate number of 
mentally retarded and mentally defective men (10). Harris reports that 
a study of individual cases “has led to the tentative generalization that 
within wide ranges of different kinds of personality and behavior prob- 
lems, and within a wide range of ‘cooperation’ in the test situation, a 
measured I.Q. remains fairly stable, even after intensive therapy and im- 
provement in adjustment’ (11) The results of the study at the Temple 
University Center show that veterans rated as psychoneurotics average 
3.9 I.Q. points lower than those without such ratings and that both groups 
are somewhat higher on the performance scale and the mean difference 
is the same order of magnitude (approximately 3.5 I.Q. points). 


Table 3 
Distribution of California I.Q.’s 


L.Q. 


65- 69 
70- 79 
80— 89 
90— 99 
100-109 
110-119 
120-129 
130-139 
140-149 


Total 
Mean I.Q. 
8.D. 


f % 
3 0.5 
6 1.1 
46 8.1 
106 18.6 
142 24.8 
128 22.4 
84 14.8 
39 6.8 
17 3.0 
571 
109.5 
15.04 
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Since the Wechsler-Bellevue was the test which was usually given to 
the literate veteran of limited schooling (less than ten grades), this 
weighting of the population at the lower educational levels probably 
contributed to the slightly below average verbal I.Q. obtained and the 
discrepancy in favor of performance scores. 

The California Test of Mental Maturity was administered to a dif- 
ferent sample of 571 cases (see Table 3). The median I.Q. obtained was 
109, the mean I.Q. 109.5 and the standard deviation 15 points. This 
distribution is much higher than that for the “normal’’ population (based 
on 100,000 persons) published in the Manual of Directions for the test 
(12); it exceeds also the published norms for twelfth grade students and 
approaches the published norms based on college freshmen (see Table 4). 


Table 4 


California Test of Mental Maturity: Comparison of Published Medians and Standard 
Deviations with Those Obtained at the Temple Center (10—p. 19) 


Median 8.D. 


Normal Population I.Q.’s (N = 100,000) 100.0 16.0 
Ninth Grade 1.Q.’s (N = 25,000) 101.5 15.5 
Tenth Grade 1.Q.’s (N = 25,000) 103.0 15.5 
Eleventh Grade I1.Q.’s (N = 25,000) 104.0 15.5 
Twelfth Grade 1.Q.’s (N = 25,000) 105.0 15.0 


Veterans at Temple Center I.Q.’s (N = 571) 109.0 15.0 


College Freshmen I.Q.’s (N = 15,000) 110.0 14.0 
College Graduate I.Q.’s (N = 2,000) 125.0 12.0 


It is noted that an I.Q. of 110 or higher was obtained for almost half 
(47%) of the 571 veterans taking the California Test of Mental Maturity 
at the Temple University Center, as opposed to 21% of the 2289 veterans 
taking the Wechsler-Bellevue. In a recent study, H. M. Hildreth, Chief 
Clinical Psychologist, V. A. Branch 12, analyzes the California Mental 
Maturity Test scores for 248 older veterans (not including any of World 
War II). On total test performance his group is classified as dull-normal 
(median I.Q. 85.4), despite the fact that Hildreth grants that the norms 
for this test are generally considered as too high. The age (median 57.6 
years) and education (median grade 8.5) factors probably contribute to 
the low scores obtained (13). 

At the Temple University Center it is recognized that a selective 
factor was in effect, since the California Test of Mental Maturity was 
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seldom administered to veterans having less than ten grades of education. 
Furthermore, a large number of these veterans were contemplating 
college entrance and requested advisement under P. L. 346. It must be 
stressed that these G. I.’s (self-selected) would be expected to be above 
average individuals as opposed to P. L. 16 Rehabilitation cases (required 
to report for advisement) which should be a more representative group. 
The number of psychoneurotics given the California Test was not suffi- 
ciently large to warrant a separate study. 


Summary 


1. A study was made of the distributions of obtained I.Q.’s for one 
sample on the Wechsler-Bellevue Scale (N = 2289); and for a second 
sample on the California Test of Mental Maturity (N = 571). 

2. Local Wechsler-Bellevue norms computed for the group of veterans 
studied at the Temple University Veterans’ Administration Guidance 
Center agree closely with the published norms, although veterans with 
a disability rating for psychoneurosis scored somewhat lower. 

3. For the sample studied on the California Test of Mental Maturity, 
the I.Q.’s obtained appear to be somewhat higher than might be expected 
with a random sample of veterans. Contributing factors to the superior 
showing are the higher than average educational level of this group and 
its self-selected character. 


Received April 23, 1948. 
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The Effects of Eliminating Binocular and Peripheral 
Monocular Visual Cues upon Airplane 
Pilot Performance in Landing * 


Stanley N. Roscoe 
University of Illinois 


Throughout the history of modern scientific psychology the visual 
perception of relative position and movement in space has held a promi- 
nent place in experimental and theoretical literature (1). Even so, our 
present understanding of the subject is not adequate to solve all of the 
perceptual problems which arise in connection with the designing of air- 
planes which will carry human pilots faster than the speed of sound. 
This paper is concerned with some of the specific problems of depth and 
movement perception which present themselves when we consider the 
possibility that some of our future supersonic airplanes may be designed 
to be flown without direct outside visibility. 

In such airplanes position and movement information must necessarily 
be presented to the pilot exclusively by instrumental means. Either 
television or radar, or both, may be employed for this purpose. However, 
all such devices as yet developed (2) present a restricted visual field on a 
flat surface similar in appearance to a small motion picture screen. When- 
ever three dimensional space is represented on a two dimensional surface, 
each eye sees the same restricted image, thus eliminating binocular dis- - 
parity and parallax and the crossed and uncrossed double images normally 
present in binocular vision. Furthermore, any such display would re- 
strict peripheral outside visibility. 

Considering the emphasis which has traditionally been placed on un- 
restricted binocular vision in the selection and training of pilots and in 
the design of equipment, one might suspect that the restrictions imposed 
by a small, flat visual field would result in a serious impairment of pilot 
performance in the flight situation. Fortunately there is no experimental 
evidence to indicate that such restrictions would make it impossible to 
fly an airplane. In fact, there is considerable evidence that the binocular 
visual cues are not particularly effective in the perception of spatial rela- 
tionships at such relatively great distances as are involved in the flight 

* This research was carried out under Contract N6ori-71, T. O. XVI, between the 
Special Devices Center, Office of Naval Research, and the University of Illinois. This 


paper is based upon Report No. 5 under that contract. The writer wishes to express 
his appreciation to Professor A. C. Williams, Jr., who directed the research. 
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situation.! Furthermore, the perception of spatial relationships while 
in flight is not an end in itself. The pilot’s immediate task is to maneuver 
his plane through a desired path in relation to other objects in space, i.e., 
to move the controls properly, and it is possible that this can be done 
without the degree of perceptual discrimination afforded by unrestricted 
binocular vision. 


The Problem 


Considering these facts, the present experiment was designed to in- 
vestigate the effects of eliminating the binocular visual cues in the flight 
situation and of restricting the angular range of visibility, which neces- 
sarily occurs when a visual field is presented on a small, flat surface. 
Two questions to be answered by the experiment were: (1) Can successful 
flights be made at all under the above mentioned conditions of restricted 
visibility, and (2) if successful flights can be made at all, what will be the 
quality of the pilot performance. 


Description of the Experiment 


In order to determine the effects of various conditions of visibility upon 
pilot performance in a specific flight maneuver, the experiment was designed 
to measure and contrast the accuracy with which experienced pilots make 
“spot” landings in two experimental situations and one control situation: 

Situation “‘A’’—(experimental) the restriction of the outside visual field to 
a square area described by horizontal and, vertical angles of approximately 
10 p en each and the elimination of binocular visual cues by the use of a 
projection periscope image cast on a ground glass screen. 

Situation ‘‘B’’—(experimental) a similar restriction of the range of visibility 
by the use of vision reducing goggles and a vision directing screen, but without 
the elimination of binocular visual cues. 

Situation “C’’—(control) unrestricted binocular visibility in the normal 
contact flight situation. 

The experimental flights were conducted in a modified Cessna T-50 air- 
lane. The task was to make an approach to a landing from straight and 
evel flight at an altitude of 800 feet and at a distance of more than one and 

one-half miles from the end of the landing runway. This task was selected as 
the one flight maneuver generally conceded by pilots and pilot instructors to 
require the greatest depth and movement discrimination. 

The criterion for the goodness of pilot performance was the accuracy of 
the landing touchdowns in relation to a designated landing “spot,” since 
Walker et al (5) found this to be the best single measure of the overall per- 
formance of experienced pilots performing this specific task. The desired 
landing ‘‘spot”’ was defined by white target panels placed on each side of the 
runway at approximately. one-fourth of the way along the landing strip. 

Situation “A’’—The.projection periscope was selected as the most practical 


and economical device for presenting the visual field on a small two dimensional 
surface. This instrument cast an image on a ground glass screen without 
markedly reducing luminosity, clarity, or color. With respect to the elimina- 


1. Woodworth (6, p. 680) concludes: “Except for ‘close work,’ the manipulation of 
small objects right before ‘the eyes, the binocular cues are probably less important than 
covering, shading, and the different kinds of perspective.”’ 
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tion of binocular disparity, parallax and crossed and uncrossed double images, 
this device was effectively equivalent to television. Only the so-called monoc- 
ular visual cues were present, and they were restricted to central vision. They 
included: (1) shading: both ‘‘cast’’ and “attached” shadows, (2) the sequence 
of objects in space, i.e., the partial covering of far objects by near, (3) four 
types of perspective: (a) linear or angular, (b) detail, (c) aerial: the partial 
loss of object color of far objects, and (d) movement: the slower apparent 
movement of distant objects. 

Detail perspective, while present, was somewhat distorted by the differ- 
ential clearness of focus of the periscopic image for objects at different distances. 
Head movement parallax, although a monocular visual cue, is not effective 
when viewing an image cast on a small flat surface. The non-visual cues of 
accommodation, convergence, and change in the pupil were probably ineffective 
in any of the three situations due to the relatively great distances involved. 
However, they were definitely eliminated by the periscope, since all objects in 
the visual field, that is, in the image on the screen, were presented at the same 
distance from the subject’s eyes, no matter what their actual distances from 
him might be. 

Situation ‘“‘B’’—Since the projection periscope, and also television, of neces- 
sity presents a single restricted visual field, it might be expected that the 
pilots’ performances would be affected by the elimination of peripheral visual 
cues (4) and head movement parallax which are supposedly of great value to 
the pilot in landing an airplane. Therefore a second combination experi- 
mental-control situation (“‘B’’) was included in which the visual field was 
restricted to the same extent as in the case of the periscopic image but without 
the elimination of binocular depth effects. This was accomplished by the use 
of vision reducing goggles and a vision directing screen to be described later. 
Also head movement parallax was reduced by instructing the subjects not to 
move their heads in order to obtain successive views of different outside areas 
through the vision directing screen (3). 

Situation “C’’—Landing performance while flying with unrestricted binocu- 
lar visibility was used as a control. By comparing pilot performances in 
situations ‘“‘A”’ and ‘“‘B”’ with each other and with performance in the control 
situation (“‘C’’), measures were obtained indicating what effects on pilot per- 
formance can be attributed to the elimination of peripheral monocular visual 
cues and head movement parallax and. what to the elimination of binocular 
visual cues. 


Description of the Apparatus 


The Airplane. A Cessna T-50 type aircraft was modified in the following 
ways: (1) the overhead plexiglass windows were removed, and sheet aluminum 
panels were installed in their place; (2) an opening approximately six inches 
square was cut in the left overhead panel to permit the projection of the peri- 
scope head through the top; and (3) semi-permanent metal braces were attached 
to the instrument panel and overhead panel to facilitate rapid installation and 
removal of the periscope. 

The Periscope. The projection type periscope (see Figure 1) constructed 
for use in situation ‘“‘A”’ cast a television-like image on a two dimensional 
ground glass surface. The lens used was three and three-quarter inches in 
diameter and of 30 inch focal length. Since a lens casts an inverted and 
reversed image, four mirrors were used in the periscopic system (see schematic 
diagram in Figure 2) in order to present a correctly oriented image. First 
surface type Panchronized mirrors were used. They were six by six inches 
square. Each mirror was placed at an angle of 45 degrees to the principal 
axis of its incident rays. The lens was placed in the system between the first 
and second mirrors at a distance of 30 inches from the screen upon which the 
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Fie. 1. Subject’s cockpit fur Situation “A” showing periscope, hood and windshield 
cutouts. The ground glass screen is shown above control wheel. 


image focused. The screen was six by six inches square and was placed per- 
pendicular to the principal axis of its incident rays. The instrument was in- 
stalled so that the image was perpendicular to the subject’s line of sight. The 
image included a range of outside visibility described by a horizontal angle of 
10° 40’ and a vertical angle of 11° 50’. 

The Vision Reducing Goggles. A special vision reducing eyepiece was con- 
structed for use in situation “B.’”’ A common laboratory type tubular vision 
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Fie. 2. Schematic diagram of the projection periscopic system. 
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reducer (see Figure 3) was modified so as to restrict the visual field of each eye 
individually to the same angular range of visibility (10° 40’ horizontal and 
11° 50’ vertical) as presented by the periscopic image in situation “A.”’ This 
was done by inserting an opaque screen in the vision reduction tube of the 
eyepiece perpendicular to the subject’s line of sight and approximately one 
inch from the subject’s eyes. Two slots, each ¢ of an inch high and ¥ of an 
inch wide and separated by the horizontal distance of 2,4 inches, were cut in 
the screen so as to present an opening in front of each eye. Over each of these 
slots was fitted an individual movable slide with an aperture approximately 
} of an inch square (see Figure 3). These slides were adjustable horizontally 
so as to compensate for the variability in the distances between the eyes of 
different subjects. It was found experimentally that these openings of approxi- 
mately $ of an inch square at a distance of approximately one inch from the 
subject’s eyes allowed each eye the desired range of visibility, thus presenting 
central binocular vision of the same visual angle (both horizontally and ver- 
tically) as that presented on a flat surfs:e by the periscopic image. 


Fig. 3. Vision reducers as modified for use in Situation “B” showing }’’ square 
apertures in adjustable slides in front of each eye. 


In order to prevent the subject (in situation ‘B’’) from obtaining successive 
images of more than one outside area by moving his head and also to restrict 
head movement parallax, a metal screen, made of sheet aluminum and painted 
black, was attached to the top of the instrument panel perpendicular to the 
subject’s line of sight and covering the area of his front windshield. A rec- 
tangular aperture, 4.145 inches high by 6.234 inches wide, was cut in the screen 
so as to allow the subject binocular vision of only one outside area directly 
forward from the airplane (see Figure 4). The area seen was the same as that 
presented by the periscopic image. The size of this aperture was determined 
trigonometrically, using 20 inches as the approximate average distance from 
the subject’s eyes to the screen and allowing for the average distance of approxi- 
mately 2.5 inches between human eyes. 
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Fie. 4. Subject’s cockpit for Situation “B’”’ showing vision directing screen, 
cockpit hood and vision reducing goggles worn by subject. 


The Cockpit Hood. A hood was constructed of black ‘‘Leatherette’”’ for use 
in situations ‘‘A”’ and “B.” It was attached around the subject pilot’s cockpit 
by a series of snap fasteners so as to prevent the subject from seeing out either 
to his right or left or to the rear. For situation ‘‘A” the windshields in front 
of the subject were covered by cardboard cutouts which were held in place by 
masking tape (see Figure 1). In addition to preventing any direct outside 
visibility, this combination of hood and windshield masks darkened the interior 
of the cockpit to such a degree that a clear, bright periscopic image could be 
seen while permitting enough light within the cockpit to allow the subject to 
use the other flight instruments (see Figure 1). 

The Landing Targets. Two portable landing target panels, each three feet 
high and fifteen feet wide, were constructed of white “‘Linene”’ cloth attached 
to a wooden framework. One panel was placed at each side of the runway 
opposite the landing ‘‘spot.’”’ They were inclined at an angle of approximately 
45 degrees to the surface of the runway so as to stand out more clearly in the 
visual field during the landing approach. 


Subjects 


Six subjects were tested in the experiment. All were flight instructors 
at the University of Illinois Institute of Aeronautics. Each was a 
qualified commercial pilot with both single and multi-engine ratings as 
well as C.A.A. Instrument and Instructor ratings. Each had at least 
1750 hours of flying time at the beginning of the experiment. Each had 
flown more than one multi-engine airplane and was familiar with the 
particular airplane used in the experiment. None of the subjects had 
previously participated in any experiment of this kind. 
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The Experimental Design 


Each of the six subjects was tested in each of the three experimental 
situations (periscope, goggles, contact), thus requiring eighteen separate 
experimental periods to complete the three series. The task was per- 
formed five successive times during each experimental period, ninety 
landing approaches being made in all, thirty in each of the situations. 

In order to balance any possible practice effects or transfer from one 
situation to another, a different sequence of situations was presented to 
each of the six subjects. By this arrangement, two subjects performed 
their first, second, and third series of trials in each of the three experi- 
mental situations, thus equally distributing among the three situations 
any advantage which might result from previous practice in another 
situation. 


Procedure 


The take-offs and the first three legs of the rectangular patterns were flown 
by the safety pilot. The wheels were not retracted and the landing flaps were 
not used. The safety pilot made the final turn onto the landing approach. 
When the airplane was lined up with the runway at an indicated altitude of 
1550 feet (800 feet above the field elevation) and an indicated air speed of 90 
miles per hour, the safety pilot signaled the subject to take the controls by 
wiggling the ailerons. As the subject reduced power for the glide, the safety 

ilot trimmed the plane for a 90 mile an hour gliding speed. The subject was 
instructed to try to maintain that air speed and control the descent with power. 
He was allowed to make any type of landing, but it was recommended that he 
make tail high front wheel landings using a little power, as the runway was not 
visible on the periscope screen with the plane in the three-point attitude. The 
safety pilot was prepared to take over in the event the subject was unable to 
make a safe landing approach, but otherwise did not touch the controls except 
to avoid other airplanes. As soon as the subject had made a touchdown, the 
safety pilot took over again and made a follow-through takeoff for the next 
trial. At the beginning of each experimental series for each subject, the safety 
pilot made one demonstration landing so that the subject could observe how 
much his visibility would be restricted in the particular situation and what he 
could expect to see during the landing approach. Before any trials were made 
in situation “A,” the safety pilot pointed out, during his demonstration landing, 
the position of the runway image on the periscope screen during the approach 
glide and how this position changed during the flare out to the level flight 
attitude just before landing. Before any trials were made in either situation 
“A” or “B,” it was explained to the subject that in the event of a cross-wind 
any drift correction would have to be made by slipping rather than by crabbing, 
so as to keep the runway within the range of visibility. 

An assistant was stationed on the ground just opposite the landing ‘“‘spot”’ 
to mark and record the first point of touchdown. The paved runways hap- 
pened to be conveniently divided into sections with expansion joints at twenty 
foot intervals. Also the survey stationing was marked in the paving every 
100 feet. By referring to these stations it was possible to record the errors in 
the accuracy of the landings to the nearest foot, as shown by the tire marks 
left on the runway after landings. 

While no specific weather condition requirements were set up, no flights 
were made in which the turbulence of the air or the direction and velocity of 
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the wind noticeably affected the accuracy of the subjects’ performances. No 
flights were made during times when the sky was overcast, thus assuring 
maximum luminosity of the periscopic image in situation “‘A.”’ 


Results 


The first question to be answered by the experiment was whether ex- 
perienced pilots could make successful landing approaches at all using 
only those outside visual cues presented by a projected periscopic image. 
Of a total of thirty landings made with the periscope, the pilots completed 
all but two without assistance from the safety pilot. On first trials alone, 
five of the six pilots completed their approaches without assistance. 
Both unsuccessful approaches (one a first trial and one a third trial) 
occurred because the pilots failed to correct for wind drift, allowing the 
image of the runway to pass out of view on the periscope screen. Appar- 
ently these errors did not arise from difficulties with depth perception, for 
when the plane was returned to the proper glide path by the safety pilot, 
the subjects were able to complete the landings without further assistance. 

Most pilots normally use a combination of “slip’”’ and “crab” to correct 
for wind drift during contact landing approaches, and this technique 
was ineffective with the angular visual limitations imposed by the peri- 
scope. Drift correction with the periscope required a special straight 
ahead “slipping” technique which was adopted by all the pilots and em- 
ployed successfully when necessary on all approaches but the two men- 
tioned above. 

Since successful landings were made with the periscope, it would be 
expected that they could also be made with the goggles. This proved 
to be the case, as all landings made with the goggles and the vision 
directing screen were completed without assistance. 

The safety of the landings, both those made with the periscope and 
with the goggles, was judged acceptable? by the safety pilot. These 
judgments included a consideration of the initial approach, the “flareout,”’ 
and the actual touchdown. The pilots reported confidence in determining 
the proper time to flare out the approach for the landing touchdown, and 
although they frequently “skipped” or bounced on landings (which they 
seldom did with unrestricted vision), they did not lose control of the 
airplane when the wheels touched the ground. 

The second question, to be answered in the event safe landings could 
be made at all with the periscope and with the goggles, involved an ob- 
jective evaluation of the quality or goodness of the pilot performance. 
The criterion for this objective rating was the accuracy of the landing 

2In pilot vernacular, a “good” landing is “any one you can walk away from.” 
These experimental landings were ones you could fly away from. 
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touchdowns. The raw error scores in feet from the landing “spot” for 
each of the ninety landings performed by the six pilots are shown in 
Table 1. An inspection of the table reveals that the most accurate land- 


Table 1 
Tabulation of Raw Data 

The errors in feet for each of the ninety landing approaches made in the experiment 
are listed according to Subject and Situation. Each group of five scores represents 
the five approaches made by one subject in one situation during one experimental 
period. The scores are listed in the same order in which the trials were performed 
during that period. The numbers in parentheses indicate the order in which each sub- 
ject was tested in each of the three situations. 


Situation “A” Situation ““B” Situation 
Subject Trial (Periscope) (Goggles) (Contact) 
1 —315 +474 —174 
2 — 308 + 6 -—- 7 
“y 3 —212 — 223 —101 
4 — 232 — 230 + 9 
5 +489 (1) —110 (2) — 32 (3) 
1 +300* +275 +370 
2 —270 0 — 70 
a ad 3 +302 — 70 +100 
4 — 286 + 60 +163 
5 +500 (1) + 40 (3) — 26 (2) 
1 — 620 + 20 — 44 
2 + 70 — 10 +17 
“3” 3 + 50 + 60 + 53 
4 +340 —135 — 10 
5 +510 (2) —120 (1) + 50 (3) 
1 —402 —406 — 22 
2 — 255 — 159 — 30 
“4” 3 — 280 +310 — 32 
4 +210 —178 + 52 
5 —197 (3) +190 (1) — 62 (2) 
1 — 23 +190 +180 
2 +210 0 — 380 
“5 3 —280 +155 
4 338 —170 + 37 
5 — 188 (2) —302 (3) —117 (1) 
1 +170 + 31 — 168 
2 + 10 + 17 — 40 
“6” 3 — 20* + 15 — 17 
4 + 42 +150 — 15 
5 — 355 (3) — 40 (2) + 30 (1) 


* Indicates approaches in which the subject would not have made a successful 
landing without assistance from the safety pilot. 
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ings were made with unrestricted visibility, while the least accurate land- 
ings were made with the periscope. Table 2 shows the average or mean 
errors (signs disregarded) for the thirty landings made in each of the 
three situations. The average error for contact landings was 85.1 feet 
from the landing “spot,” for goggle landings 142.4 feet, and for periscope 


Table 2 
Summary of the Results that Obtain when Performance Scores of the Six Pilots Are 
Pooled and Measures of Central Tendency and Variability Are Computed 
The constant errors, average errors, standard deviations, and the standard errors of 


the means (average errors) for the thirty landing approaches made in each of the three 
situations. Values are expressed in feet. 


Situation “A” Situation “B” Situation “C’’ 


Constant Error — 15.7 — 19.8 — 40 
Average or Mean Error 251.8 142.4 85.1 
Standard Deviation 155.7 124.5 94.0 
Standard Error of the Mean 28.9 23.1 17.4 


landings 251.8 feet. The difference between the average accuracy of the 
landings made with the periscope and those made in the control situation 
was significant at the 1% level. The difference in the accuracy of the 
landings made with the periscope and with the goggles was also significant 
at the 1% level. The difference between goggle landings and contact 


Table 3 


Significance of the Differences between Group Mean Performances in 
Different Experimental Situations 
Product-moment correlation coefficients between the raw error scores (signs dis- 
regarded) of the individual landings of all subjects in the three situations,* differences 
between group means for the different situations, standard errors of the differences 
between the correlated means, critical ratios, and the level of significance of the differ- 
ences between means as determined by Fisher’s Test. 


Situations “A” and “B” Situations “A” and “C” Situations “B” and “C” 
rap = .033 TAC = —,132 rpc = .136 
Ma — Mp = 109.4 Ma — Mc = 166.7 Ms — Mc = 57.3 
cart = 36.4 caine = 35.6 cant = 27.0 
CR = 3.01 CR = 4.68 CR = 2.12 
Difference significant Difference significant Difference significant 
at 1% level at 1% level at 5%, not at 1% level 


subject’s Ist, 2nd, 3rd, 4th, and 5th landings in each situation were matched with his 
corresponding landings in each of the other situations. 


* The scores were matched for correlation as they appear in Table 1. Thus each . 


| 


cr or & 


Airplane Pilot Performance in Landing 659 


landings was significant at the 5% level but not at the 1% level. Before 
using the massed data to make these above comparisons, the error scores 
for the individual trials of the six subjects in the three different situations 
were correlated. The coefficients were all insignificant. 

A further result, incidental to the purpose of the experiment and 
provided for in the design, appeared from the analysis of practice effects. 
An inspection of the raw error scores in Table 1 reveals the absence of 
any intra-serial practice effects either with the goggles or the periscope. 
Apparently five trials made in rapid succession in the new situations had 
as much fatigue effect as practice effect. The subjects reported the 
new instruments to be most exacting. To test for inter-serial gains, or 
transfer-practice effects resulting from practice on the same task under 
different conditions, the raw error scores from Table 1 were converted to 
standard scores and rearranged according to whether they were performed 
during each subject’s first, second, or third experimental period, regard- 
less of the situation in which the subject was tested. The results of the 
various computations are not shown, but it was found that there was 
consistent improvement, the improvement from the first periods to the 
third periods being significant at the 5% level. 


Discussion 

Since there were no similar or directly related studies upon which pre- 
dictions could be based for the present investigation, the experiment was 
oriented as a direct attack at the applied rather than the pure problem 
of space and movement perception in the flight situation. It was not 
known whether pilots could make landings at all with the outside visual 
field presented on a small, flat surface. The results demonstrated that 
this could be done. It remains to discuss what further inferences can 
be drawn from the accuracy of the performances in the three experimental 
situations. 

If it were true that binocular depth effects contribute little to space 
and movement perception at such relatively great distances, it would be 
expected that landing accuracy would be affected approximately the same 
amount by the goggles as by the periscope. This was not the case. This 
may be interpreted in three possible ways. First, it is possible that the 
binocular depth cues as such are of significant importance. This could 
easily be tested in another experiment simply by covering one eye and 
comparing the accuracy of the landings made in this condition with the 
results from situation “C.” It is doubtful that the difference would be 
significant. 

A second interpretation would postulate that it is the amount rather 
than the kind of visual restriction imposed that determines the accuracy 
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of the performance, i.e., the more cues taken away, the greater the errors 
in landing accuracy. Thus when the binocular cues are eliminated in 
addition to restricting the angular range of visibility, as was done by the 
periscope, errors become significantly greater than when only the angular 
range is restricted. 

The third possibility is that the apparatus used did not achieve an 
equivalent restriction of peripheral vision in the two experimental situa- 
tions as it was designed todo. The restrictions imposed by the periscope 
were definitely known. The design and use of the goggles rendered the 
accurate determination of the true amount and nature of their restricting 
effects most difficult. A consideration of the techniques employed by 
the pilots in the various situations reveals a possible source of error for 
the results obtained with the goggles. 

While flying with unrestricted visibility, the pilots were able to keep 
the landing targets in view at all times until the landing had been made. 
They reported this cue most effective in making accurate landings in 
situation “‘C.”” While flying with the periscope or with the goggles, 
the targets, which were of necessity placed at the sides of the runway, 
passed out of the visual field while the airplane was still several hundred 
feet from the landing “spot.” When flying with the periscope the sub- 
jects had no control over this, but with the goggles they could do some- 
thing about it. Although they could not increase their angular range of 
visibility for any given moment, they could, by shifting their position in 
the cockpit, change the area of visibility which they saw through the 
vision directing screen. By simply leaning forward and slightly to the 
right, they could keep the target on the left side of the runway in sight 
for a few seconds—and several hundred feet—longer than it could be 
seen with the periscope. The subjects were instructed not to shift their 
position for this purpose, and while they did not do so consciously or 
intentionally, according to their reports, the additional advantage was 
available to them, and it is possible that they did make use of it. Also it 
is suspected that this may have accounted for the absence of serious 
errors in drift correction in situation “B.”’ By leaning to the upwind 
side of the cockpit, the runway could be kept in sight through the vision 
directing screen even though the airplane were slightly crabbed. 

That this was possible represents a fault in the experimental procedure 
which can only be corrected in later investigations. The validity of the 
results for situation “B” is to be suspected. If the true difference be- 
tween the accuracy of performances in situations ‘‘B’’ and “‘C’’ were 
greater than it appears, the true difference between performances in 
situations “A” and “B” would be correspondingly less, indicating the 
greater relative irmportance of the peripheral monocular visual cues in 
the flight situation. 


a 
4 
4 
é 
A 
4 
F 


Airplane Pilot Performance in Landing 661 


The present investigation demonstrated that pilots can make success- 
ful approaches to landings both with a simple angular restriction of the 
peripheral visual field and with a similar angular restriction of outside 
visibility plus the effective elimination of the binocular cues of depth 
and movement. The significant differences in the accuracy of the pilots’ 
performances while flying with the periscope, with the vision restricting 
goggles and vision directing screen, and in the control condition of un- 
restricted visibility, indicate both of these groups of visual depth and 
movement cues to be important in the accurate landing of an airplane. 
While the present results suggest the binocular to be relatively more im- 
portant than the peripheral monocular cues, the validity of «hese partic- 
ular results is to be questioned in view of the unforeseen procedural 
difficulties encountered with the goggles and the vision directing screen. 
However, the important conclusion to be drawn from this investigation is 
that pilots are successful in making use of whatever outside visual cues 
are presented to them while making airplane landing approaches. 


Summary 

Six instrument pilots were tested for accuracy of landing under these 
conditions of visibility in a Cessna T-50 aircraft: 

Condition A. Outside vision restricted to an image cast on a ground 
glass screen by a projection periscope. The image confined the range of 
outside visibility to a visual angle of approximately 10 degrees, both 
horizontally and vertically, and binocular cues of depth and movement 
were eliminated. 

Condition B. Similar restriction of the outside visual field achieved 
by use of vision reducing goggles and a vision directing screen. Binocular 
cues were present. 

Condition C. Unrestricted outside visibility as in normal contact 
flight. This situation served as control. 

Every pilot made five landings under each condition, ninety landings 
being made in all. Conditions were rotated among pilots to balance 
practice effects. 

Safe approaches and landings were made by all pilots in all conditions. 
Landings were most accurately made in Condition C, control, where the 
average error was 85.1 feet from the landing “spot’’ (sign of error dis- 
regarded). The average error for landings in Condition B (vision reducing 
goggles) was 142.4 feet. Least accurate landings were made in Condition 
A (periscope) where the average error was 251.8 feet. These differences 
were found to be significant in each case, but the difference between 
periscope and goggles may be exaggerated as a result of certain unforeseen 
procedural difficulties which were uncontrolled. 


Received April 8, 1948. 
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Prediction of Male Readership of Magazine Articles * 


Evelyn Perloff** 
Ohio State University 


The purpose of this study was to determine the way in which five 
variables combined for maximum readership of articles in The Saturday 
Evening Post. The study dealt with the reactions of men only. The 
readership likes and dislikes of women will form a separate study. Since 
the ultimate objective was to predict, prior to publication, how many 
male readers would start to read the published articles, the multiple re- 
gression technique was used. 


The Data 


The Articles. The present study was limited to articles which ap- 
peared in The Saturday Evening Post throughout 1946. An “article” 
was a non-fiction item that did not appear regularly, contained 10% or 
more text, and was not an editorial. There were 190 articles included 
in the study. They represented about 50% of all articles which appeared 
in the Post in 1946. The articles used were those appearing in issues of 
the Post on which readership surveys had been made. Questions were 
asked to determine the number of men who saw, started, and finished 
each item in the issue. Since the primary concern of the study was the 
article’s power to attract male readers, the criterion selected was the per- 
centage of men who saw the article and started to read it. Such per- 
centage figures will be referred to hereafter as the “starting readership 
per cent.” 

The Variables. Five variables were included in the study. These 
variables, although not necessarily the best determinants of starting 
readership per cent, were available in present records and it was believed 
that they might possess some predictive significance. The five variables 
used in the present study are primarily of interest to editors responsible 
for layout. The variables were, number of illustrations, color of illustra- 
tions, sex of persons in illustrations, proportion of opening page(s) devoted 
to text, and subject matter of the article. 

*This study was conducted while the writer was a research associate in the Develop- 
ment Divison of the Research Department, The Curtis Publishing Company. 

**The author wishes to express her grateful appreciation to Mr. Herbert C. Ludeke, 
Manager, Development Division, Curtis Publishing Co., and to Mr. Richard Gaylord 
and Dr. Hubert Brogden, The Adjutant General’s Office, who offered many useful criti- 
cisms during the research and in reading the manuscript. 
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Number of illustrations was merely a count of the number of pictures 
allotted to the article. Color of illustrations consisted of the following 
three classes: full-color, duotone (usually gives the impression of black 
and white with over-all color tint) and black and white. Sez of persons 
in illustrations also had three classes: males only, females only, or both 
males and females. Proportion of opening page(s) devoted to text was 
classified into three groups: (1) articles with less than 20% of opening 
page(s) given to text, (2) articles having 20% to 40% opening page(s) 
in text, and (3) articles with 40% and over of the opening page(s) devoted 
to text. The categories of the subject matter variable were based on a 
classification system (viz., business, war and peace, literature and the 
arts, recreation, etc.) which is a modification of the one proposed by 
Waples and Tyler.'' An article was classified on the basis of its title, sub- 
title, illustrations, and captions only. The text of the article was not 
considered in classifying the article. 


The Procedure 


Since three variables were qualitative, it was necessary to assign 
numerical values to the various classes (within the variables) in order to 
handle the data quantitatively. The first step in determining the numer- 
ical value of a given class was to calculate the mean starting readership 
per cent (criterion) for all classes. The mean starting readership (per 
cent of readers seeing and starting the article) of a class was considered 
the best indicator of its predictive potency. If, for example, articles 
containing five illustrations had a mean starting readership per cent 
that was substantially higher than the mean starting readership per cent 
of those articles with only two illustrations, then it seemed reasonable to 
give the five illustrations category more weight than the two illustrations 
category. To weight classes in their relative importance to the starting 
readership per cent implied that the higher mean criterion score should 
receive the higher numerical value. This was the practice adhered to in 
the assignment of code numbers. 

Having coded the variables in terms of mean starting readership per 
cent, our next step was to determine the degree to which they were related 
to the criterion and to each other. Computation of all 15 intercorrela- 
tions provided the necessary data. All correlation coefficients were 
Pearson Product Moment. : 

Since it was found that so many of the variables were related to each 
other, it would be impossible to determine their individual contribution 
to the success of an article, although this is highly desirable. With the 
multiple correlation procedure, however, an approach to the solution of 

1 Waples, D., and Tyler, R. W., What People Want to Read About, pp. 224-241, 1931. 


\ 
4 

7 


Prediction of Male Readership of Magazine Articles 665 


this problem can be reached by examination of the relative effect of each 
of these variables in the composite effect on starting readership. At the 
same time the over-all value of this set of variables in the optimum pre- 
diction of starting readership can only be evaluated by the multiple cor- 
relation procedure. The regression equation for predicting purposes was 
obtained and used to predict the starting readership of articles appearing 
in issues of The Saturday Evening Post during another year, 1947. This 
provided an indication of the validity of the present technique in pre- 
dicting the starting readership per cents of future Post articles. 


The Results and Discussion 


The findings will be presented in three sections: (1) The Distributions; 
(2) The Determination of the Composite Effect; and (3) The Cross- 
validation. 

The Distributions. Figures 1-5 show the distributions (in bar chart 
form) of starting readership per cent for each of the five variables. The 
left-hand column of each Figure indicates the various levels of readership. 
The characteristics of the articles are shown across the bottom of the 
graph. The number of articles for each!class of a variable is shown in 
the appropriate column. All starting readership per cents referred to in 
the article are indexes and not actual figures. 

The ultimate value of each of the variables studied cannot be deter- 
mined until they are examined, holding all others constant, because of 
the relationship among them. It is, for instance, not yet known whether 
less than 20% or between 20-39% of the opening page(s) devoted to 
text is better. Perhaps the better articles are set up with less than 20% 
opening text because of the belief that the opening page(s) of the more im- 
portant articles should contain little text. Any conclusions concerning 
the ultimate value of a particular variable are those that one would draw 
in considering this variable as the only one changing. Unfortunately, 
however, as will be later pointed out, any one variable is related to others 
and practical conclusions from these results may or may not exist. 
Figure 1 shows the distributions (in bar chart form) of starting reader- 
ship per cent and the number of illustrations. 

The relationship (correlation coefficient = .35) of number of illustrations 
to starting readership per cent indicates on the face of it that number of 
illustrations significantly influences the male reader in starting an article. 
In examining Figure 1 it will be apparent that there are two definite 
breaks. Thus, there are sharp changes from two illustrations and below 
to three and above and from five illustrations to six and above. The 
general trend is for starting readership to improve with the number of 
illustrations. 
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NUMBER OF ILLUSTRATIONS IN ARTICLE 


Fie. 1. The effect of number of illustrations upon starting readership per cent 
(N = 190) (r = .35). 


The distributions of color of illustrations and starting readership per 
cent are given in Figure 2. 

There appears to be a definite relationship (correlation coefficient = .28) 
between the amount of color and starting the article. There are no 
breaks in the mean starting readership values for the classes in color of 
illustrations as there are for the variable, number of illustrations. There 
is an increase in the starting readership per cent progressively from black 
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COLOR OF ILLUSTRATIONS 


Fie. 2. The effect of color of illustrations upon starting readership per cent 
(N = 190) (r = .28). 
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and white to full-color. The distribution for the category, ‘“‘other,” is 
widely and evenly spread in its variation and, therefore, no conclusions 
can bedrawn. “Other” includes articles having no illustrations or having 
illustrations in two-color or in black and white plus color. 

The influence of sex of persons in illustrations and starting readership 
is shown in Figure 3. 
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SEX OF PERSONS IN ILLUSTRATIONS 


Fig. 3. The effect of sex of persons in illustrations upon starting readership per cent 
(N = 190) (r = .22). 


The relationship (correlation coefficient = .22) between sex of persons 
in illustrations and starting readership also appears to be of some signifi- 
cance. As in the case of number of illustrations there is a sharp change in 
starting readership from illustrations with females only to illustrations 
showing men. Although there is a preference by the male reader for 
pictures of his own sex only, little was lost by using illustrations with both 
males and females. The higher value for articles having illustrations 
including males suggests that such articles have more preference with 
men and are more likely to be started by the male readers than articles 
which contain women in their illustrations. As pointed out previously, 
however, it may not. have been the type of illustration alone but other 
correlated factors that are responsible for the starting readership. The 
subject matter of an article is important for starting readership and the 
kinds of illustrations are undoubtedly dependent upon this variable. 
“No Data” refers to articles with no illustrations or to those with illustra- 
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tions in which there were no people or the sexes of the persons shown 
were not discernible. 

Figure 4 shows the distributions of proportion of opening page(s) de- 
voted to text and starting readership per cent. 
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Fie. 4. The effect of proportion of opening page(s) devoted to text upon starting 
readership per cent (N = 190) (r = — .16). 


There appears to be an inverse relationship (correlation coefficient 
= — .16) between this variable and how many men will start to read an 
article. It is apparent from Figure 4 that devoting less than 20% of the 
opening page to text results in the highest starting readership. There is 
a clear change from less than 20% of the opening page(s) devoted to 
text and devoting more than 20% to text. The general trend is for start- 
ing readership to improve as the per cent of text on the opening page(s) 
decreases. 

Figures 1-4 show distributions of the four variables which are ob- 
jective; that is they can be defined in only one way. The fifth variable, 
the subject matter of the article, depended upon the judgments of the 
classifiers, which were often at variance. In addition, the specific in- 
terests of the respondents, based so often upon the events of the times, 
may vary considerably. As a result, this variable cannot be considered 
so stable as the preceding ones. 

Figure 5 shows the distributions of subject matter and starting reader- 
ship. There is greater variation among the classes of this variable than 
in any other. The number of cases in any one category was often too 
few to consider the category as a separate class. This eliminated various 
classes which are part of the gamut of subjects upon which Post articles 
are written. These articles were classified under the category, “Other.” 
The relationship (correlation coefficient = .42) indicates clearly that the 
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Fig. 5. _The effect of subject matter of article upon starting readership per cent 
(N = 190) (r = .42). 


subject matter of an article considerably influences the male reader to 
start it. In examining Figure 5 it will be apparent that men who read 
The Saturday Evening Post have definite likes and dislikes of Post topics. 
Although there is a steady increase in starting readership from topics 
least liked to those best liked, there are also several sharp changes 
grouping together both similar levels of preferences and similar kinds of 
subject matter. The general trend is for male starting readership to im- 


Table 1 
Intercorrelations between Variables 1-5 and of Starting Readership Per Cent 
(N = 190) 
% Text 
Starting No. Color Sex on 
Reader- of of of Opening Subject 
Vs.riable ship % Illus. Illus. Persons Page(s) Matter 
Starting 
Readership % _ 35 .28 .22 —.16 42 
No. of Illus. 35 52 —.17 —.14 .20 
Color of Illus. .28 52 —.10 — 42 01 
Sex of Persons .22 —.17 —.10 — 14 31 


Per cent Text on 
Opening Page(s) —.16 
Subject Matter 42 .20 01 | .03 


03 
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prove significantly when Post articles deal with topics such as sports, de- 
scription and analyses of campaigns, information about army life and 
personal war adventures. These topics reveal a preference by male 
readers for action-type articles. Articles on health and hygiene and 
general aspects of business offered less attraction to men. 

The Determination of the Composite Effect. The correlation matrix 
is shown in Table 1. The horizontal and vertical headings indicate the 
five variables used in the study. Proportion of opening page(s) devoted 
to text gave the lowest correlation (correlation coefficient = — .16) with 
starting readership per cent, while the coefficient between subject matter 
and starting readership was the highest (correlation coefficient = .42). 

Although at this point we considered individually those variables 
having the highest correlation with the criterion as indicating the best 
articles, when these variables are combined the most valuable variables 
will be identified by the relationship of high correlation coefficients with 
the criterion and low correlations with the other variables. Such a con- 
sideration is automatically taken care of when the multiple correlation 
coefficient is computed. 


Table 2 
Weights of Five Variables for Predicting Starting Readership Per Cent 
(N = 190) (R = .56) 


Variable Weight 
Subject Matter 31 
Number of Illustrations 25 
Sex of Persons in Illustrations .20 
Color of Illustrations 12 
Proportion of Opening Page(s) Devoted to Text —.11 


The five variables studied herein should be considered all together. 
It may not be feasible, however, to achieve the optimum composite due to 
magazine policy or the expense of varying all variables optimally. The 
selection of optimum color or number of illustrations, etc. on the basis of 
means, considering these alone, can make noticeable improvement in the 
resulting starting readership. 

For prediction purposes the regression equation was computed. 
Table 2 shows the weights that each variable obtained. These weights 
are an approximation of the relative independent value of each variable 
to the success of the article. Use of this regression equation yielded a 
correlation coefficient of .56. The standard error of estimate for the R 
was 9.6%. Hence, the chances are that in about 68 out of 100 cases the 
predicted starting readership per cents will be within an error of 10 points 
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or less. We may be certain that very few starting readership estimates 
will be in error by more than 30%. 

Calculation of the coded score weights (weights dependent upon the 
measuring scale of the specific variable) gave the necessary data for the 
regression equation. The final equation is as follows: 


Predicted Starting Readership Per cent Index = 30.4 (Index) + 5.2 
X class value (No. of Illus.) + 1.7 X class value (Color of Illus.) 
+ 4.0 X class value (Sex of Persons in Illus.) + 2.2 X class value 
(Proportion of Opening Page[s ] Devoted to Text) + 1.9 X class value 
(Subject Matter). 


An example showing how to obtain the predicted starting readership per 
cent by the regression equation is given following the description of 
Table 3. 

Table 3 shows the class values for each of the five variables. The 
horizontal headings indicate the variables included in the study. The 
characteristics of each variable and their corresponding numerical class 
values are given in the body of the Table. To obtain the predicted 
starting readership for a given article the procedure is: (1) multiply for 
each of the five variables the appropriate class value by the coded score 
weight of the variable indicated in the regression equation; and (2) add 
up these five products plus the regression equation constant, 30.4. 

Take, for example, an article in an issue of the Post which has three 
illustrations in full-color, males only in the illustrations, 25% text devoted 
to opening pages, and whose subject is recreation, team sports. Refer- 
ence to Table 3 gives the following class values respectively for these 
five characteristics of the article, 2, 4, 3, 2, and 7. Substituting these 
values in the regression equation and multiplying by the corresponding 
coded score weights, plus the constant, 30.4, yields a predicted starting 
readership of 77.3%. 

The Cross-validation. To determine the extent to which the weights 
of the characteristics of articles would be valuable in years other than the 
year 1946, when the articles included in this study appeared, we have 
applied this regression equation to 149 articles appearing in the 1947 
issues of the Post. The correlation between the actual and predicted 
starting readership per cents was .36. It was anticipated that this cor- 
relation would be somewhat higher, although it would be expected to be 
lower than the multiple (R = .56) obtained on the validating population. 
The lower correlation predictions in this later year are probably due to the 
change in interests over the period of the year intervening. Insufficient 

*The author has prepared a circular slide-rule called a Predictograph which has 


interesting attention-getting possibilities and incorporates similar material shown in 
Table 3. 
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classes in the subject matter variable may also be responsible for the 
lower correlation. 

The average difference between the actual and predicted starting 
readership per cents was 8.3%. The predicted starting readership per 
cents were within 5% of the actual starting readership in 44% of the 
articles, within 10% in 66% of the articles, and within 15% in 86% of 
the articles. As would have been expected, those articles for which the 
predictions were close, i.e., within 5% were for the most part articles 
which centered around the mean of the actual starting readership per 
cents. The higher or lower an observed starting readership, the poorer 
the prediction. Articles which had no similar precedents were the most 
difficult to predict. These were often the ‘‘other’’ cases. 


The Applications 


The results of this study can be of value as aids to judgments on 
editorial matters. Thus the present regression equation gives the best 
possible prediction of the number of men who will start to read an article 
in the Post when only five easily determined variables are considered. 
Further research is being conducted to determine: (1) the value of these 
in predicting feminine readership; and (2) the value of these and 23 other 
variables in predicting readership. 

The primary application of the present regression equation and the 
simplified methods of computation outlined above lies in checking the 
value of a tentative layout for an article. When the estimated reader- 
ship per cent is below average, changes can be made, prior to publication, 
with an increase in the average readership of each issue of the magazine. 

The primary limitation that should be considered is the change in 
weights, and thus the equation’s predictive value, as conditions change 
with time. Such change may result from: (1) the selection of new articles 
from the wide range of “types” of articles which can be written; (2) the 
varying and changing needs, interests, and preferences of a large group 
of readers; and (3) the unpredictability of the events which may occur 
in a given period and which influence readership. This would imply 
continued follow up with further study from time to time. 

Although the present study is specifically pertinent to layout features 
of articles in a magazine, the multiple regression technique can also be 
used for content evaluation, following the search for appropriate and 
predictive variables. Perhaps in time, the results of both layout and 
content analyses may, after adequate and intense investigation of the 
predictive significance of each variable and of the accuracy of their com- 
bined prediction, partially eliminate the need for the regular, current 
surveys. The necessity, however, for constant validation and revision 
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due to changing conditions of time and interests, excludes the practical 
possibility that predictions alone can ever completely supplant the survey 
method, although, if highly accurate, predictions should reduce both 
operating and survey time and expense. 


Summary and Conclusions 


One hundred and ninety articles in The Saturday Evening Post through- 
out 1946 were analyzed in an attempt to predict the number of men who 
would start to read the articles. Five variables believed to be deter- 
minants of starting readership were studied. These variables were: (1) 
number of illustrations; (2) color of illustrations; (3) sex of persons in 
illustrations; (4) proportion of text devoted to opening page(s); and (5) 
subject matter of the article. The criterion (starting readership per cent) 
was the number of men, who saw the article and started to read it. These 
figures were obtained from readership survey results. 

The multiple correlation technique was followed. Calculation of the 
regression equation permitted prediction of the starting readership per 
cents. To check the validity of this equation, the starting readership 
per cents of 149 articles appearing in the 1947 Post were predicted and 
compared with the actual starting readership per cents obtained by the 
survey method. 

The following conclusions are supported: 


1. The multiple correlation and regression technique proved to be a 
successful method for predicting starting readership of Post articles by 
male readers. 

2. The accuracy of the predictions of future articles should fall within 
a 10% difference between predicted and actual starting readership per 
cents in about 68% of the cases. This percentage error is satisfactory 
for most practical purposes. 

3./ The order of the relative importance of the five variables included 
in this study is: (a) subject matter; (b) number of illustrations; (c) sex of 
persons in illustrations; (d) color or illustrations; and (e) proportion of text 
devoted to opening page(s). 


Received May 1, 1948. 
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Kelly, George A. New methods in applied psychology. College Park, 

Maryland: University of Maryland, 1947. Pp. viii + 301. 

The title creates the illusion of a systematic and comprehensive treat- 
ment. The long sub-title indicates the more modest nature of the volume 
which represents the “Proceedings of the Maryland Conference on 
Military Contributions to Methodology in Applied Psychology held at 
the University of Maryland, November 27-28, 1945 under the auspices 
of the Military Division of the American Psychological Association.” 

The contributions of psychologists to questions emerging in connection 
with the American war effort covered a wide range of problems. The 
material presented at the Maryland Conference concerns the establish- 
ment of criteria of performance (5 papers), methods for classification of 
military personnel (5), selection of officers (3), training (4), group morale 
(1), psychological research on military equipment (5), statistics and 
theory of psychological measurement (5), relationships between psy- 
chology and psychiatry (2), clinical diagnosis of mental disturbances (3), 
and use of electroencephalography (2 papers). 

A large chapter is devoted to post-war developments in the various 
branches of military psychology: personnel psychology (M. W. Richard- 
son, R. N. Faulkner), aviation psychology (J. C. Flanagan, J. G. Jenkins), 
clinical psychology (M. A. Seidenfeld, W. A. Hunt), and military research 
by civilian psychologists (C. W. Bray, M. 8. Viteles), with a general 
summary and recommendations by D. G. Marquis. 

It is impossible to evaluate here separately each contribution or even 
the larger units. The readers will appreciate the summaries by J. C. 
Flanagan (instruments of measurement), W. R. Miles (engineering for 
human use), L. F. Shaffer (clinical techniques), and M. S. Viteles (criteria 
of performance) which integrate and round out the individual contri- 
butions to a given area. 

The comment will be limited to a few points. The volume leaves the 
impression that American psychology was methodologically well pre- 
pared to meet the problems posed by the war emergency. This does 
not mean that the specific techniques were always at hand. However, 
the training in sound research methods gave the psychologists a mental 
equipment with which to create the tools for attacking a given problem. 
One area, in particular, which witnessed a creative expansion of the 
study of behavior into a largely new phase of “human engineering”’ was 
675 
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the field of experimental testing of military equipment. It is hoped that 
the principles and techniques developed in this field (and in other fields 
of military psychology) will find a fruitful use in peace-times in designing 
industrial tools and machines. 

We should like to close the review with reference to the keynote address 
of the conference, “New opportunities and new responsibilities for the 
psychologists,” given by J. G. Jenkins, chairman of the Department 
of Psychology at the University of Maryland and war-time head of the 
Aviation Psychology branch, Bureau of Medicine and Surgery, U. 8S. 
Navy. Jenkins stressed that the research psychologist should choose 
problems not so much on account of their methodological neatness as 
because of their promise of returns which are of social importance. This 
means adding the criterion of social significance of a research project to 
the accepted criterion of statistical significance of the results. This is a 
serious challenge to the hundreds of Ph.D. candidates looking every year 
for a subject for their thesis. May it not be a challenge for their ad- 
visers, too? 

Josef Brozek 


Laboratory of Physiological Hygiene, 
University of Minnesota 


DeGruchy, Clare. Creative old age. San Francisco: Old Age Counseling 

Center, 1946. Pp. 148. $2.75. 

Lawton, George. Aging successfully. New York: Columbia University 

Press, 1946. Pp. 266. $2.75. 

Creative Old Age is the last in a trilogy of books on old age counseling. 
Dr. Lillien J. Martin, alert at 91, dying in 1943, began her old age counsel- 
ing in 1921. After nine years she published Salvaging Old Age in 1930, 
which was superseded by Sweeping the Cobwebs in 1933, which is the basic 
book on the Martin method in old age counseling. After Dr. Martin’s 
death her long-time associate and assistant, Mrs. DeGruchy, brought 
out the second book in the trilogy, A Handbook for Old Age Counselors, 
1944, which outlines more clearly for counselors the five counseling 
sessions usual in the Martin technique. 

But during Dr. Martin’s life and afterwards psychologists and coun- 
selors kept asking for a book of cases or case histories. This, at last, has 
been produced by Mrs. DeGruchy. In these three books, a trained and 
competent counselor who has occasional later maturity or old age cases 
presented to him, now has an old age counseling kit. Meanwhile, Mrs. 
DeGruchy is preparing a life of Dr. Martin and giving much time to 
training old age counselors at the San Francisco center. 
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In Creative Old Age, Mrs. DeGruchy has preferred to give in artistic 
short-story form a dozen case histories, plus an account of two group 
projects. The practicing counselor, with loan or sale copies available, 
might well give it to a prospective or actual counselee to arouse faith and 
expectancy of results through contemplation of these “resurrections,”’ 
as it were. For these are not just ordinary cases. Nearly anyone of 
them would be a great challenge to any counselor. The counselor who 
has mastered the first and second books of the trilogy will find his curiosity 
on actual cases largely satisfied by this book. 

However, only the experience of a score or a hundred cases of his 
own will teach him what can’t be put in books. Dr. Martin told the 
writer that she did her first three hundred cases without fee, because, she 
said with a twinkle, ‘“How did I know I had done more good than harm?” 
Dr. Martin felt that her professional task in old age counseling had been 
to develop the technique, leaving it to her successors to multiply old age 
counseling centers. 

Dr. Martin and Mrs. DeGruchy hold the thesis that we should never 
retire or quit, but rather continue alive and living to the end with work 
(for money or for mental wages), with play (preferably active and self- 
expressive) and with rest, as parts of each day. Putting a meaning into 
life by finding a pattern (for revision if desirable), by finding his own best 
work or service expression, and by finding his own best play or recreation 
expression, is the heart of the Martin method in old age counseling. 

Aging Successfully is another book a counselor might loan to a coun- 
selee, but perhaps only to read some special chapter, for the book itself 
is twice as long as Creative Old Age. Many chapters seem very long and 
might well be made into two shorter chapters. ‘To write a book one 
must accumulate information, arrange it appropriately, and present it in 
an interesting fashion,” says Dr. Lawton to people who expect to “retire 
and write a book,” “just like that.” His quotations, jokes, poems, illus- 
trations, statistics, cases, are such as to make one wish for an index to 
find them again, though this book tries to be popular and not a learned 
volume with many footnotes. 

The fifteen chapters are addressed to all of us who are growing older, 
which means from the cradle up. In fact, Dr. Lawton says, old age 
patterns develop at nursery school age (p. 59). The book as a whole is 
a book on maturing and later maturity rather than an old age as such. 
It is a book on how to prevent old age rather than on how to cure old 
age. Don’t “grow old gracefully,” says Dr. Lawton, rather “grow old 
aggressively.” 

Many books are now appearing on this subject; more will follow as we 
become aware that 30 per cent of us are over 45, numbering 42,000,000. 
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Soon 50 may be the “deadline” for retirement; we have 25,000,000 over 
50, 13,000,000 over 60, 5,000,000 over 70 and 1,000,000 over 80. Good 
books recently to appear include such as After 70 by Herbert N. Casson, 
the efficiency expert, and The Best Years by Walter B. Pitkin (1946). 
But Aging Successfully is different. For Dr. Lawton has been a specialist 
in adolescence and a professional school psychologist, and since 1936 has 
concentrated on problems of later maturity. He has been a psychological 
advisor for several old folks’ homes and has done much individual later 
maturity and old age counseling. He got his ideas together for popular 
presentation in a Cooper Union lecture course in 1945-46 and this book 
is the outcome. Everywhere, however popular the style, it is evident 
that a psychologist dictated the lines. 

Dr. Lawton spent ten days with Dr. Martin in 1943, just before her 
death, and he pays her high tribute, but he does not pretend to follow 
the Martin methods that she took fifteen or twenty years to evolve. He 
thinks “‘no special ‘new’ psychology, whether in theory or practice, has 
been invented in order to understand or to deal with the difficulties of 
older people.”’ His book is just “‘the application to a hitherto unexplored 
field—middle and later maturity—of universally tested and accepted 
principles of clinical psychology, mental hygiene, education, vocational 
counseling, rehabilitation.” 

Like Dr. Martin, Dr. Lawton in his chapter “Retire to, not from,” 
favors life-long aliveness, adventuring, usefulness, vitality. A novel 
feature is his “Bill of Rights for Old Age” in the form of a radio program 
given by older folk themselves. 

While both these books are on psychology each hints that the base 
problem of old age is even more sociological and economic. ‘The 
Eskimos put their old out on the ice floes. We prolong the lives of our 
old as long as possible but deny them opportunities for useful activity 
and personal enrichment.’’ Hence, both these books in a way are books 
on vocational counseling for later maturity and old age in a battle against 
an industrial society which, unlike an agricultural or socialized society, no 
longer has three-generation homes. In New York City are one hundred 
employment agencies, but not one of them especially for people over forty. 

Dr. Martin says that feeling unhappy and feeling useless are the tragic 
signs of old age and the emergency calls for old age counseling. If our 
social order were to provide their quota of jobs for those past 40 and half 
time jobs for those fit to work after sixty, this new sense of being wanted, 
needed and useful would alone, in America, be equal to one hundred 
thousand maturity and old age counselors in promoting good mental 
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hygiene and happiness in people past forty, and in taking dread from 
people approaching forty. 
Christopher Ruess 
American Institute of Family Relations, 
Los Angeles 


Jucius, M. J., Maynard, H. H., and Shartle, C. L. Job analysis for 
retail stores. Research Monograph Number 37, Bureau of Business 
Research, Ohio State University, 1945. Pp. 65. $2.00. 


This is a “how-to-do-it’’ manual, a description of how a job analysis 
program may be developed in distributive industries. It deals with the 
values, limitations, and procedures in making a job analysis and describes 
how the results of job analysis can be applied to job evaluation in retail 
stores. The main contribution of this monograph is the detailed descrip- 
tion of the job analysis and job evaluation procedures with sample forms 
and instruction sheets. Although certain government publications pre- 
sent job analysis and job evaluation materials at a more reasonable price, 
and the importance of obtaining worker cooperation in the installation of 
these procedures is barely mentioned, the manual serves a useful function 
for the personnel worker in a distributive industry. 


William A. McClelland 
Brown University 


Mursell, James L. Psychological testing. New York: Longmans, Green 
& Co., 1947. Pp. 449. $4.00. 


Apparently organized as a text for education students with previous 
statistical training who will have little additional instruction on testing 
but who will be called upon to utilize test data frequently, Mursell 
describes his work as a “comprehensive and balanced account of the 
testing movement in psychology.” The material is drawn from areas 
of intelligence, aptitude, personality, interest, attitude and character 
study. His emphasis is on “intelligent comprehension . . . rather than 
detailed account of findings.” He also includes a considerable amount of 
standard ID’s subject matter, reinforced by a 521 item bibliography—not 
counting manuals from the 93 tests referred to in the text. 

Evaluation should be largely in terms of the author’s own aims, but 
there are some general criticisms which any book should meet. In a re- 
cent publication, recent material is expected. In terms of the tests 
selected, the forms described, the validation studies cited, the content 
is often wanting. The tests for discussion must serve many purposes. 
As examples of the various types they must be either popular, useful, or 
grossly bad. They must satisfy the student requirements of being con- 
versant with tests he is likely to encounter, of being aware of serious 
drawbacks or unusual potentialities in some of them, of knowing how to 
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interpret or when to ask for the tests available. Here again, the choice 
however large is poor. There are glaring omissions, like the Kuder 
Preference Record in the area of interests, the MMPI and the Guilford 
batteries in personality. The test criticisms are often perfunctory, 
occasionally erroneous. Although it is delightful to find how effectively 
he dispenses with Link’s Personality Quotient Test, it is strange to see the 
Humm-Wadsworth actually given considerable acceptance. There is 
no one place where the beginner is systematically instructed in the 
procedures and sources for evaluating tests not included in the text. 

In meeting his own aims, Mursell’s performance is checkered. It 
seems he treats rather well methodology in theory and practice—the 
usual problems of reliability, validation, standardization, etc. but often 
contradicts himself later in discussion of specific tests. Definitions are 
important for beginners, but he seems to be troubled by them; often mere 
reference to Warren’s Dictionary would settle them. Scant attention is 
given to the history of the testing movement, although this would seem an 
excellent vehicle for the large emphasis he places on controversies and 
problems. Gratifyingly, he takes up many other issues—e.g., factor 
analysis, scoring technics, projective methods, limitations and potential- 
ities of psychometrics—expressing opinions which can of course only be 
appraised clearly after more data are available. Students should be aware 
of these issues, and be informed sufficiently to follow the developments as 
the movement grows. The inclusion of the [D’s material is of question- 
able value. Naturally anyone dealing with tests should be familiar with 
these findings, but of inestimably greater worth would be careful guidance 
in how to utilize specifically the data in actual testing and interpreting. 
Mere quotation of researches does not seem to achieve this, and would 
perhaps be more meaningfully presented in a course devoted to differential 
psychology. Nonetheless, the author has recognized a real need to be 
served by a book of this kind; the testing movement will gain real strength 
in proportion to the sophistication instilled in those actually called upon 
to utilize its results. 


Ohio Wesleyan University 


W. Grant Dahlstrom 


Moore, Bruce V., Kennedy, J. Ewing, and Castore, George F. The 
work training and status of supervisors as reported by supervisors in 
industry. Department of Psychology and Management Training 
Service, The Pennsylvania State College, 1946, pp. 31. $1.00. 

This study presents the results of a questionnaire submitted to 873 
supervisors in industries throughout Pennsylvania during the months of 

April and May of 1946. Men who were then currently employed as 
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foremen were asked their opinions about their own jobs, foremen and 
supervisors in industry. The areas covered by the questionnaire con- 
cerned the duties considered to be the most important responsibilities of a 
supervisor, the training that was obtained prior to promotion to a super- 
visory position, the training that supervisors felt they should have re- 
ceived, and whether or not the supervisor identifies himself with manage- 
ment or, with the employee working force. Certain autobiographical 
information as to age, length of'service, both as an employee in the com- 
pany and as a supervisor, and the number of workers supervised was also 
obtained. 

Of the total number of supervisors reporting, 231 statements were 
obtained by means of personal interview and the remaining 642 were 
compiled from unsigned questionnaires mailed directly to Pennsylvania 
State College. In both instances complete anonymity was maintained. 
It is interesting to note that there appears to be no significant difference 
in the resuits of the personal interview and the questionnaire method. 

Several interesting highlights stand out as a result of this study. In 
defining their job most of the supervisors considered that their prime 
responsibility was to show others how to do the work and secondly a 
knowledge of men and how to keep them loyal and working. The average 
amount of training received by the supervisors was less than one year 
and in no instance did a supervisor report that he felt he had had sufficient 
training for the job. Approximately 60% of the supervisors merely carry 
out orders for management and only 15% help make policies affecting 
their departments. Yet, despite this lack of authority, about 70% of 
these men prefer to consider themselves as a part of management. Their 
major criticism of their job as supervisors seems to center around the fact 
that most of them feel that they have bad management and that manage- 
ment could do a great deal for them by keeping promises and explaining 
policy more completely. 

In conclusion, this study presents a well-ordered statistical summary 
of the present day status of the supervisor in industry. It is also signifi- 
cant in terms of methodology in that there seem to be no reliable differences 
between the information obtained from the questionnaires and from the 
personal interviews. 

Henry L. Sisk 


Stevenson, Jordan and Harrison, Inc. 
Chicago, Illinois 
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New Books, Monographs, and Pamphlets 


Books, monographs, and pamphlets for listing and possible review should be sent to 
Donald G. Paterson, Editor, Department of Psychology, University 
of Minnesota, Minneapolis 14, Minnesota 


Family, marriage, and parenthood. Howard Becker and Reuben Hill, 
Editors. Boston: D. C. Heath and Co., 1948. Pp. 829. $5.00. 
Projective techniques. John E. Bell. New York: Longmans, Green and 

Co., Inc., 1948. Pp. 512. $4.00. 

Psychology for pastor and people. John S. Bonnell. New York: Harper 
and Brothers, 1948. Pp. 225. $2.50. 

Christian paths to self-acceptance. Robert H. Bonthius. New York: 
Columbia University Press, 1948. Pp. 254. $3.25. 

Development of the basic Rorschach score with manual of directions. Char- 
lotte Buhler, Karl Buhler, and D. Welty Lefever. Los Angeles: 
Rorschach Standardization Studies, 4759 Hollywood Blvd., 1948. 
Pp. 190. $3.00. 

The anatomy of melancholy. Robert Burton. New York: M. W. Drexler 
Book Co., 1948. Pp. 1036. $3.48. 

The myth of the magus. E.™M. Butler. New York: The Macmillan Co., 
1948. Pp. 282. $8.75. 

How to supervise people in industry. Eliot D. Chapple and Edmond F. 
Wright. Deep River, Conn.: National Foremen’s Institute, Inc., 
1948. Pp. 128. $2.50. 

Pupil personnel service. Frank G. Davis, Editor. Scranton, Pa.: Inter- 
national Textbook Co., 1948. Pp. 638. $3.75. 

Educational psychology. Robert A. Davis. New York: McGraw-Hill 
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THE EMOTIONS: Outline of A Theory 
By JEAN-PAUL SARTRE 


N this small volume, the eminent French philosopher develops a new theory of 
psychological interpretation. - Delving into the “magic” of the emotional process, he 
analyzes the roles which fear, lust, melancholy end anguish play in the life of man, and 


what is the true renlity of conscious life. 


Sartre’s new book is a significant step on the way to the understanding of the emotions. 


THE PSYCHOLOGY OF 
IMAGINATION 

By JEAN-PAUL SARTRE 

Describes the great oe func- 


tion of the mind. By means of a close 
consideration of the operations of imagi- 
nation and of the nature of the imaginary, 
Sartre reveals a new way of conceiving 
the nature of the psychic life and of the 
mind’s relationship with the external 
world. $3.75 
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Europe.” —Wolfgang Kohler, Swarthmore 
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idited by EMIL FROESCHELS, M.D. 
Speech and voice therany has now de- 
veloped to a remarkably high degree. 
In this work, the editor points out the 
various aspects of the science with the 
assistance of numerous collaborators 
well known in the field. $6.00 
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By RICHARD A. WILSON 

Deep and original thinking about the 
functions of speech in the life of man. 
So impressed was George Bernard Shaw 
by “the magnitude of the service the 
author has done,” that he wrote a 
lengthy essay to serve as an introduction 
to this book. $3.75 


PSYCHIATRY FOR 
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By J, A. C. BROWN 


An introduction to psychiatry with much 
practical information for the layman. 
“Sound, reliable and well baianced.”— 
N. Y. State Jrni. of Med. $3.00 


SELECTED WRITINGS OF 
BENJAMIN RUSH 

Ediied by DAGOBERT D. RUNES 
Pioneer in American Psychiatry,” Dr. 
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liberal and frequently an almost vision- 
ary mind. $5.00 
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APPLIED 
PSYCHOLOGY 


REVISED EDITION 


By RICHARD W. HUSBANP 
Associate Professor of Psychology, Iowa State College 


The many teachers who used this book in the original 


edition—wihich has been out of print for some years—will 


recognize the new edition only by its. practical, realistic, 


and lively treatment. The subject matter-is almost com- 


pletely new, or at least extensively reworked. The great 


developments in the <pplied psychology field during the 


past fifteen years account for much.of the change, and the 


author’s practical experience in industrial relations in re- 


cent years is responsible for much of the mew material, 


" especially on the applications of psychology in industry. 
Among the new topics treated at length are Vocational 
Aptitudes and Adjustment, Employment Procedures, The 


Customer’s Side (of sales and advertising), and Psycho- 


logical Factors in Marriage. We urge you to examine this 


volume before you order second-semester texts. 
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INSTRUCTIONS Resolution is expressed in terms of the lines per millimeter recorded by a particular 
film under specified conditions. Numerals in chart indicate the number of lines per millimeter in adjacent 
“T-shaped” groupings. 

In microfilming, it is necessary to determine the reduction ratio and multiply the number of lines in the 
chart by this value to find the number of lines recorded by the film. As an aid in determining the reduction 
ratio, the line above is 100 millimeters in length. Measuring this line in the film image and dividing the length 
into 100 gives the reduction ratio. Example: the line is 20 mm. long in the film image, and 100/20 = 5. 


Examine “T-shaped” line groupings in the film with microscope, and note the number adjacent to finest 
lines recorded sharply and distinctly. Multiply this number by the reduction factor to obtain resolving power 
in lines per millimeter. Example: 7.9 group of lines is clearly recorded while lines in the 10.0 group are 
not distinctly separated. Reduction ratio is 5, and 7.9 x 5 -= 39.5 lines per millimeter recorded satisfacto- 
rily. 10.0 x § = 50 lines per millimeter which are not recorded satisfactorily. Under the particular condi- 
tions, maximum resolution is between 39.5 and 50 lines per millimeter. 


Resolution, as measured on the film, is a test of the entire photographic system, including lens, exposure, 
processing, and other factors. These rarely utilize maximum resolution of the film. Vibrations during 
exposure, lack of critical focus, and exposures yielding very dense negatives are to be avoided. 
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